Re: [xml] Bug in the regular expression parser; character range with escaped characters not working



[apparently my mail bounced off the list, sending again !]

On Thu, Jul 11, 2013 at 06:06:33PM +0200, Dominik Skanda wrote:
Hi,

I think there is a bug in the regular expression parser for character
ranges, i.e. the character class [\]-a] with an escaped character (here
\]) is not recognized by the libXML regular expression parser.

E.g.: The simple type is not working:

<xs:simpleType name="LimitedString">
        <xs:restriction base="xs:string">
                <xs:pattern value="[\]-a]*"/>
        </xs:restriction>
</xs:simpleType>

but

<xs:simpleType name="LimitedString">
        <xs:restriction base="xs:string">
                <xs:pattern value="[Z-a]*"/>
        </xs:restriction>
</xs:simpleType>

If one looks at the ASCII table:

 !"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
`abcdefghijklmnopqrstuvwxyz{|}~ 

one sees that Z is the most right character before the character ] which
does not have to be escaped in the character range definition. This
indicates that there is a bug.

  I think it comes from the specification :-)

http://www.w3.org/TR/xmlschema-2/#nt-SingleCharEsc

  [24]      SingleCharEsc ::= '\' [nrt\|.?*+(){}#x2D#x5B#x5D#x5E]

if you look, in the specification productions, the [ and ] are used
consistently to express a choice of characters, e.g.

  [21]      XmlChar    ::=      [^\#x2D#x5B#x5D]

*but* in production 24 they expected to add [ and ] to that set of
characters and forgot to add them in (escaped as \[ and \] of course).

the fact that they should be allowed is actually given in the 2 last
exemples in the table below production [24]

 Someone (Liam ?) may be interested to check if there is an errata for
 it :)

 and libxml2 code being directly driven by the productions from the spec
well i forgot to add them.

I have prepared some example files to demonstrate the shortcoming:

test_not_validating.xsd is using the first simple type definition and
test_validating.xsd the second one, respectively.

You can try to validate test.xml with: 

xmllint --noout --schema test_not_validating.xsd ./test.xml

and

xmllint --noout --schema test_validating.xsd ./test.xml

respectively.

It would be nice if anyone could confirm the BUG and possibly solve it.

Regards,

 Could you test the patch provided as attachment ?

   thanks,

Daniel

-- 
Daniel Veillard      | Open Source and Standards, Red Hat
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/

diff --git a/xmlregexp.c b/xmlregexp.c
index 1f9911c..2f6d155 100644
--- a/xmlregexp.c
+++ b/xmlregexp.c
@@ -4883,7 +4883,7 @@ xmlFAParseCharClassEsc(xmlRegParserCtxtPtr ctxt) {
        (cur == '|') || (cur == '.') || (cur == '?') || (cur == '*') ||
        (cur == '+') || (cur == '(') || (cur == ')') || (cur == '{') ||
        (cur == '}') || (cur == 0x2D) || (cur == 0x5B) || (cur == 0x5D) ||
-       (cur == 0x5E)) {
+       (cur == 0x5E) || (cur == '[') || (cur == ']')) {
        if (ctxt->atom == NULL) {
            ctxt->atom = xmlRegNewAtom(ctxt, XML_REGEXP_CHARVAL);
            if (ctxt->atom != NULL) {




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]