Re: [xml] "double"s and schema validation



On Tue, Jul 20, 2010 at 3:40 AM, Dan Sommers  wrote:
Given this schema file, t.xsd:

   <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";>
     <xs:element name="t" type="xs:double"/>
   </xs:schema>

And this xml document, t.xml:

   <t>e</t>

I got this:

   $ xmllint --schema t.xsd t.xml
   <?xml version="1.0"?>
   <t>e</t>
   t.xml validates

Note that <t>.</t> and <t>.e</t> also validate.

I tracked it down to xmlschematypes.c, starting around line 2465, where
it starts scanning the input for something suitable for sscanf("%lf").
Should that code contain an extra check that there is at least one digit
somewhere?

I think you are right. This code:
                while ((*cur >= '0') && (*cur <= '9')) {
                    cur++;
                }
accepts 0 or more digits (before the period); perhaps it should check
for 1 or more digits instead:

--- xmlschemastypes2.c  2010-07-21 13:17:12.229467800 +0200
+++ xmlschemastypes.c   2010-07-21 13:20:14.737716800 +0200
@@ -2392,6 +2392,7 @@
         case XML_SCHEMAS_DOUBLE:{
                 const xmlChar *cur = value;
                 int neg = 0;
+                int digits = 0;

                if (normOnTheFly)
                    while IS_WSP_BLANK_CH(*cur) cur++;
@@ -2463,8 +2464,10 @@
                 if ((cur[0] == 0) || (cur[0] == '+') || (cur[0] == '-'))
                     goto return1;
                 while ((*cur >= '0') && (*cur <= '9')) {
-                    cur++;
+                    digits++; cur++;
                 }
+                if (digits == 0)
+                    goto return1;
                 if (*cur == '.') {
                     cur++;
                     while ((*cur >= '0') && (*cur <= '9'))


I think it comes down to the definition of "decimal" in the
spec; the lexical representation arguably allows for such degenerates,
although the canonical representation does not.

The standard explicitly mentions that the fractional part may be
omitted, which suggests that the integral part should not be omitted.
Also, the empty string is not mentioned amongst the valid examples.

So, is this a bug?  I couldn't find a bug or any previous discussion one
way or the other.  If it is a bug, is it in xmlschematypes.c or in the
underlying sscanf implementations?  I get the same results at work
(OpenSolaris) and at home (Debian).

When running xmllint on your example, sscanf is not called because val
(a xmlSchemaValPtr) is NULL. So the only verification is the one
performed by the code in xmlschematypes.c

-- 
Life is complex, with real and imaginary parts.
"Ok, it boots. Which means it must be bug-free and perfect. " -- Linus Torvalds
"People disagree with me. I just ignore them." -- Linus Torvalds



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]