[xml] caret in regexp character class


With libxml2-2.9.4, the regular expression [ab^cd] is equivalent to
[^cd], i.e. it matches all characters except 'c' and 'd'. However from
my reading of
https://www.w3.org/TR/2001/REC-xmlschema-2-20010502/#regexs, (1) the
caret has no special meaning unless it is the first character of the
charGroup in a charClassExpr, i.e. the above regexp should match the
characters 'a', 'b', '^', 'c', and 'd'.

A possible alternative interpretation, although I can't find any actual
support for it, is that (2) the caret must be escaped if used other than
as the first character of the charGroup in a charClassExpr - and the
regexp [ab\^cd] does indeed give the above match-all-characters
behavior. But if there is such a requirement, surely the [ab^cd]
expression should give an error rather than the non-intuitive result

I.e. I believe the current behavior is a bug, and that the correct
behavior is (1). I can provide a patch if needed, but would appreciate
getting a confirmation first - or a refutation, in which case I would
appreciate a motivation...


--Per Hedeland

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]