Re: [xml] [PATCH] Fix "token" datatype into account in RelaxNG patterns



On Fri, Nov 29, 2013 at 11:53:17PM +0100, Jan Pokorný wrote:
$ cat small.rng
<grammar datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes";
         xmlns="http://relaxng.org/ns/structure/1.0";>

<start>
    <element name="script">
          <attribute name="file">
              <data type="token">
                  <except>
                      <data type="token">
                          <param name="pattern">/etc/(rc\.d/)?init\.d/cman</param>
                      </data>
                  </except>
              </data>
          </attribute>
    </element>
</start>

</grammar>

---

$ cat testcase.xml
<script file=" /etc/rc.d/init.d/cman "/>

---

before (bug in question present):

$ xmllint --noout --relaxng small.rng testcase.xml
testcase.xml validates

desired (hopefully, this is not a false assumption, this is also
         the behavior of jing or xmllint when the attribute value
         is whitespace-normalized manually):

$ xmllint --noout --relaxng small.rng testcase.xml
testcase.xml fails to validate


  Best is to go back to the spec to get a normative answer:
token is:

http://www.w3.org/TR/xmlschema-2/#token

  [Definition:]   token represents tokenized strings. The ·value space·
  of token is the set of strings that do not contain the carriage return
  (#xD), line feed (#xA) nor tab (#x9) characters, that have no leading
  or trailing spaces (#x20) and that have no internal sequences of two
  or more spaces.

the definition is based on the value space, and indeed the value space
after stripping of the surrounding space for the filepath is a token.

Then let's check pattern:

[Definition:]   pattern is a constraint on the ·value space· of a
datatype which is achieved by constraining the ·lexical space· to
literals which match a specific pattern. The value of pattern  ·must· be
a ·regular expression·.

 again, the regexp must be tested on the value space of the datatype
i.e. the one with the extra white space(s) trimmed.

The patch fixes the issue, but I must admit it's more like the easiest
solution I was able to achieve, not necessarily a proper one (also
considering the various contexts the affected code can be run in).

Generally, it seems that some relevant parts of the code are affected
by some change trying to be backwards compatible;
from xmlSchemaValidateFacetWhtsp (the originally used function?):

Note that @value needs to be the *normalized* value if the facet
is of type "pattern".

Please let me know if I can help somehow to get the test case passing.
If agreed, I will also turn it to the proper part of the test suite.

And yes, test suite still passes.

  I think the patch is correct, I think we could improve it to use
the value space carried in val->value.str for all the types derived from
'string' , including token.

 A revised improved patch would explicit the enum values in
   include/libxml/schemasInternals.h
i.e.
    XML_SCHEMAS_UNKNOWN = 0,
    XML_SCHEMAS_STRING = 1,
    XML_SCHEMAS_NORMSTRING = 2,
...

and then use the 2 range comparison instead of
  val->type == XML_SCHEMAS_TOKEN
in the added test in xmlSchemaValidateFacetInternal

  that and adding the test case would be a good way to improve the patch
indeed, i would be fine applying as-is, but if you can build a better
version, as above that would be welcome !

  thanks,

Daniel

-- 
Daniel Veillard      | Open Source and Standards, Red Hat
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]