Re: [xslt] Pull request : xsltproc --fileparam
- From: Tobias Hoffmann <lprint-list thax hardliners org>
- To: The Gnome XSLT library mailing-list <xslt gnome org>
- Subject: Re: [xslt] Pull request : xsltproc --fileparam
- Date: Thu, 18 Jan 2018 22:44:11 +0100
On 17/01/18 18:11, Edouard Tisserant wrote:
I'd like to get the parser working on libxslt too. Unfortunately, it
relies on exslt:regex. I'll try implement exslt:regex on my repo... stay
tuned ;)
Any known previous attempt ?
Well, EXSLT specifies Javascript-flavoured regular expressions,
but most C/C++ library provide only Perl- or Posix-flavoured semantics.
And while e.g. c++11 / boost.regex does support Javascript-like syntax,
it expects
wchar_t, instead of UTF8 (and I faintly remember some connection to the
ICU library...).
One of the more difficult parts with unicode support is
"case-insensitive" matching.
Extracting an existing Regexp-implementation from a browser/... does not
seem to be a good idea, esp. as V8 even moves from a Irregexp-based
solution to something integrated with their JIT-compile-framework [1].
Wrt. to using libxml2's regexp implementation, it seems implement only a
very basic dialect of regular expressions. AFAIK it doesn't even support
captures (which are probably the premier reason to use
regexp:match(...), and not just regexp:test() ...)
Therefore I decided to go with a pure C solution, that is
alreahttps://v8project.blogspot.de/2017/01/speeding-up-v8-regular-expressions.htmldy
utf8 aware, but does not use Javascript semantics: The well-known PCRE2
library.
https://github.com/smilingthax/songtools/tree/master/xsltlib/regexp/
My implementation is (currently) incompatible in the following
additional ways:
1. It uses "http://exslt.org/regexp" as namespace instead of the correct
"http://exslt.org/regular-expressions",
2. does not implement
string regexp:replace(string input, string regexp, string
flags, string replacement)
but provides a new method,
object regexp:split(string input, string regexp, string? flags)
Splits up the input on regexp and returns a node set of text
nodes
separated by <match no="1">...</match> elements, when there are
captures present, or a single empty <match/> otherwise.
Rationale: regexp:replace has some stupid limits, e.g. it would
make a lot of sense for the replacement to be a node/subtree, not just
a string.
Also, other languages like PHP and JS support splitting a whole
string into parts, which supports the very helpful feature to also use
captures inside the split expression.
Finally, regexp:replace can be implemented via regexp:split.
Feel free to fork my implementation into a separate/standalone
repositiory or even integrate it into lib(e)xslt...
Tobias
[1]
https://v8project.blogspot.de/2017/01/speeding-up-v8-regular-expressions.html
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]