Re: [xslt] Pull request : xsltproc --fileparam

On 17/01/18 18:11, Edouard Tisserant wrote:
I'd like to get the parser working on libxslt too. Unfortunately, it
relies on exslt:regex. I'll try implement exslt:regex on my repo... stay
tuned ;)

Any known previous attempt ?

Well, EXSLT specifies Javascript-flavoured regular expressions,
but most C/C++ library provide only Perl- or Posix-flavoured semantics.
And while e.g. c++11 / boost.regex does support Javascript-like syntax, it expects wchar_t, instead of UTF8 (and I faintly remember some connection to the ICU library...). One of the more difficult parts with unicode support is "case-insensitive" matching. Extracting an existing Regexp-implementation from a browser/... does not seem to be a good idea, esp. as V8 even moves from a Irregexp-based solution to something integrated with their JIT-compile-framework [1]. Wrt. to using libxml2's regexp implementation, it seems implement only a very basic dialect of regular expressions. AFAIK it doesn't even support captures (which are probably the premier reason to use regexp:match(...), and not just regexp:test() ...)

Therefore I decided to go with a pure C solution, that is alrea utf8 aware, but does not use Javascript semantics: The well-known PCRE2 library.

My implementation is (currently) incompatible in the following additional ways:
  1. It uses ""; as namespace instead of the correct

  2. does not implement
string regexp:replace(string input, string regexp, string flags, string replacement)

      but provides a new method,
        object regexp:split(string input, string regexp, string? flags)
Splits up the input on regexp and returns a node set of text nodes
           separated by <match no="1">...</match> elements, when there are
           captures present, or a single empty <match/> otherwise.

Rationale: regexp:replace has some stupid limits, e.g. it would make a lot of sense for the replacement to be a node/subtree, not just a string. Also, other languages like PHP and JS support splitting a whole string into parts, which supports the very helpful feature to also use captures inside the split expression.
      Finally, regexp:replace can be implemented via regexp:split.

Feel free to fork my implementation into a separate/standalone repositiory or even integrate it into lib(e)xslt...



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]