Re: [xml] Bug in the regular expression parser; character range with escaped characters not working
- From: Daniel Veillard <veillard redhat com>
- To: Dominik Skanda <Dominik Skanda WEB DE>
- Cc: xml gnome org
- Subject: Re: [xml] Bug in the regular expression parser; character range with escaped characters not working
- Date: Sun, 14 Jul 2013 10:25:31 +0800
On Sun, Jul 14, 2013 at 10:15:17AM +0800, Daniel Veillard wrote:
[apparently my mail bounced off the list, sending again !]
On Thu, Jul 11, 2013 at 06:06:33PM +0200, Dominik Skanda wrote:
Hi,
I think there is a bug in the regular expression parser for character
ranges, i.e. the character class [\]-a] with an escaped character (here
\]) is not recognized by the libXML regular expression parser.
E.g.: The simple type is not working:
<xs:simpleType name="LimitedString">
<xs:restriction base="xs:string">
<xs:pattern value="[\]-a]*"/>
</xs:restriction>
</xs:simpleType>
but
<xs:simpleType name="LimitedString">
<xs:restriction base="xs:string">
<xs:pattern value="[Z-a]*"/>
</xs:restriction>
</xs:simpleType>
If one looks at the ASCII table:
!"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
`abcdefghijklmnopqrstuvwxyz{|}~
one sees that Z is the most right character before the character ] which
does not have to be escaped in the character range definition. This
indicates that there is a bug.
I think it comes from the specification :-)
http://www.w3.org/TR/xmlschema-2/#nt-SingleCharEsc
[24] SingleCharEsc ::= '\' [nrt\|.?*+(){}#x2D#x5B#x5D#x5E]
if you look, in the specification productions, the [ and ] are used
consistently to express a choice of characters, e.g.
[21] XmlChar ::= [^\#x2D#x5B#x5D]
*but* in production 24 they expected to add [ and ] to that set of
characters and forgot to add them in (escaped as \[ and \] of course).
the fact that they should be allowed is actually given in the 2 last
exemples in the table below production [24]
Someone (Liam ?) may be interested to check if there is an errata for
it :)
Whoops, I'm the one confused now. x5B#x5D do refer to [ and ]
and libxml2 code being directly driven by the productions from the spec
well i forgot to add them.
and the patch is clearly not fixing anything because (cur == 0x5B) ||
(cur == 0x5D) are explicitely handled in the condition.
I will have to really investigate :-)
Daniel
I have prepared some example files to demonstrate the shortcoming:
test_not_validating.xsd is using the first simple type definition and
test_validating.xsd the second one, respectively.
You can try to validate test.xml with:
xmllint --noout --schema test_not_validating.xsd ./test.xml
and
xmllint --noout --schema test_validating.xsd ./test.xml
respectively.
It would be nice if anyone could confirm the BUG and possibly solve it.
Regards,
Could you test the patch provided as attachment ?
thanks,
Daniel
--
Daniel Veillard | Open Source and Standards, Red Hat
veillard redhat com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | virtualization library http://libvirt.org/
diff --git a/xmlregexp.c b/xmlregexp.c
index 1f9911c..2f6d155 100644
--- a/xmlregexp.c
+++ b/xmlregexp.c
@@ -4883,7 +4883,7 @@ xmlFAParseCharClassEsc(xmlRegParserCtxtPtr ctxt) {
(cur == '|') || (cur == '.') || (cur == '?') || (cur == '*') ||
(cur == '+') || (cur == '(') || (cur == ')') || (cur == '{') ||
(cur == '}') || (cur == 0x2D) || (cur == 0x5B) || (cur == 0x5D) ||
- (cur == 0x5E)) {
+ (cur == 0x5E) || (cur == '[') || (cur == ']')) {
if (ctxt->atom == NULL) {
ctxt->atom = xmlRegNewAtom(ctxt, XML_REGEXP_CHARVAL);
if (ctxt->atom != NULL) {
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome org
https://mail.gnome.org/mailman/listinfo/xml
--
Daniel Veillard | Open Source and Standards, Red Hat
veillard redhat com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | virtualization library http://libvirt.org/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]