Re: [xslt] Pull request : xsltproc --fileparam

From: Tobias Hoffmann <lprint-list thax hardliners org>
To: The Gnome XSLT library mailing-list <xslt gnome org>
Subject: Re: [xslt] Pull request : xsltproc --fileparam
Date: Thu, 18 Jan 2018 22:44:11 +0100

On 17/01/18 18:11, Edouard Tisserant wrote:

I'd like to get the parser working on libxslt too. Unfortunately, it
relies on exslt:regex. I'll try implement exslt:regex on my repo... stay
tuned ;)

Any known previous attempt ?


Well, EXSLT specifies Javascript-flavoured regular expressions,
but most C/C++ library provide only Perl- or Posix-flavoured semantics.

And while e.g. c++11 / boost.regex does support Javascript-like syntax,it expectswchar_t, instead of UTF8 (and I faintly remember some connection to theICU library...).One of the more difficult parts with unicode support is"case-insensitive" matching.Extracting an existing Regexp-implementation from a browser/... does notseem to be a good idea, esp. as V8 even moves from a Irregexp-basedsolution to something integrated with their JIT-compile-framework [1].Wrt. to using libxml2's regexp implementation, it seems implement only avery basic dialect of regular expressions. AFAIK it doesn't even supportcaptures (which are probably the premier reason to useregexp:match(...), and not just regexp:test() ...)

Therefore I decided to go with a pure C solution, that isalreahttps://v8project.blogspot.de/2017/01/speeding-up-v8-regular-expressions.htmldyutf8 aware, but does not use Javascript semantics: The well-known PCRE2library.


  https://github.com/smilingthax/songtools/tree/master/xsltlib/regexp/

My implementation is (currently) incompatible in the followingadditional ways:

  1. It uses "http://exslt.org/regexp"; as namespace instead of the correct
      "http://exslt.org/regular-expressions";,

  2. does not implement

string regexp:replace(string input, string regexp, stringflags, string replacement)


      but provides a new method,
        object regexp:split(string input, string regexp, string? flags)

Splits up the input on regexp and returns a node set of textnodes

           separated by <match no="1">...</match> elements, when there are
           captures present, or a single empty <match/> otherwise.

Rationale: regexp:replace has some stupid limits, e.g. it wouldmake a lot of sense for the replacement to be a node/subtree, not justa string.Also, other languages like PHP and JS support splitting a wholestring into parts, which supports the very helpful feature to also usecaptures inside the split expression.

      Finally, regexp:replace can be implemented via regexp:split.

Feel free to fork my implementation into a separate/standalonerepositiory or even integrate it into lib(e)xslt...


  Tobias

[1]https://v8project.blogspot.de/2017/01/speeding-up-v8-regular-expressions.html

Follow-Ups:
- Re: [xslt] Pull request : xsltproc --fileparam
  - From: Edouard Tisserant

References:
- [xslt] Pull request : xsltproc --fileparam
  - From: Edouard Tisserant
- Re: [xslt] Pull request : xsltproc --fileparam
  - From: Nick Wellnhofer
- Re: [xslt] Pull request : xsltproc --fileparam
  - From: Edouard Tisserant

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]