[xml] incorrect RelaxNG error reporting



Hi everyone,

in our project we've recently started using a RelaxNG schema to validate
our XML documents through the lxml python bindings of libxml2. However
sometimes the errors reported for invalid documents are very unhelpful
and even we as developers get confused and have to spend a few minutes
looking for what's actually wrong. To demonstrate I simplified our
schema and an invalid xml document with a simple python script that I've
appended to this email. The script is not needed, running
xmllint --relaxng schema.rng test.xml
will produce the same results.

The error that libxml reports is:
test.xml:3:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element interfaces has extra content: eth

which is incorrect since the actual error is that the eth element is
missing a mandatory attribute.

What's also interesting is that if you completely remove the definition
and use of the "define" element in the schema (the test.xml doesn't use
it so it can stay the same). The error stack changes to:
test.xml:3:0:ERROR:RELAXNGV:RELAXNG_ERR_ATTRVALID: Element eth failed to validate attributes
test.xml:3:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element interfaces has extra content: eth

Which is a reasonable error message, even though it would be a bit more
user friendly if there was some kind of information about which
attributes failed or are missing, but I can understand that...

There are a few more scenarios where similar problems occur, I can
describe them if needed, but to keep this email shorter I will ignore
them for now. I've also found a few bug reports that describe similar
situations, but since they've been last updated several years ago I
first wanted to write here before reviving them.

So I've done some digging around and figured out that all of these
imprecise error reports are related to <interleave> <optional> and
<choice> so rules that can easily cause non-determinism. If the
non-determinism is handled with some kind of backtracking these kind of
problems could arise. The other way is to create a finite automaton that
can always be determinized solving this problem. I looked through the
libxml sources and found that in fact a finite automaton is created
however I didn't find anything related to it's determinization so I'm
assuming there isn't anything. I apologize if I've missed something but
it's a fairly long source file...

I want to ask if this is a bug you would find worth fixing or if the
current behaviour is intended (since the bugs in the bug tracker are 5+
years old).
If not I might consider fixing this myself but I would like at least
some comments about if the implementation of the determinization would
be possible to integrate with how the validation is currently handled.

Thanks for your reply!

Best regards,
Ondrej Lichtner

--------------------------------------
schema.rng:
--------------------------------------
<grammar xmlns="http://relaxng.org/ns/structure/1.0";
    datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes";>
    <start>
        <element name="host">
            <attribute name="id"/>

            <interleave>
                <zeroOrMore>
                    <ref name="params"/>
                </zeroOrMore>

                <element name="interfaces">
                    <zeroOrMore>
                        <ref name="eth"/>
                    </zeroOrMore>
                </element>
            </interleave>
        </element>
    </start>

    <define name="define">
        <element name="define">
            <oneOrMore>
                <element name="alias">
                    <attribute name="name"/>
                    <choice>
                        <attribute name="value"/>
                        <text/>
                    </choice>
                </element>
            </oneOrMore>
        </element>
    </define>

    <define name="eth">
        <element name="eth">
            <attribute name="id"/>
            <attribute name="label"/>
            <interleave>
                <optional>
                    <ref name="define"/>
                </optional>

                <zeroOrMore>
                    <ref name="params"/>
                </zeroOrMore>

                <optional>
                    <ref name="addresses"/>
                </optional>
            </interleave>
        </element>
    </define>

    <define name="addresses">
        <element name="addresses">
            <interleave>
                <optional>
                    <ref name="define"/>
                </optional>

                <zeroOrMore>
                    <element name="address">
                        <choice>
                            <attribute name="value"/>
                            <text/>
                        </choice>
                    </element>
                </zeroOrMore>
            </interleave>
        </element>
    </define>

    <define name="params">
        <element name="params">
            <interleave>
                <optional>
                    <ref name="define"/>
                </optional>

                <zeroOrMore>
                    <element name="param">
                        <attribute name="name"/>
                        <choice>
                            <attribute name="value"/>
                            <text/>
                        </choice>
                    </element>
                </zeroOrMore>
            </interleave>
        </element>
    </define>
</grammar>

--------------------------------------
test.xml:
--------------------------------------
<host id="slave1">
    <interfaces>
        <eth label="A">
            <addresses>
                <address value="192.168.100.1/24"/>
            </addresses>
        </eth>
    </interfaces>
</host>
--------------------------------------
test.py:
--------------------------------------
#!/usr/bin/python
from lxml import etree
from pprint import pprint

relaxng_doc = etree.parse("schema.rng")
schema = etree.RelaxNG(relaxng_doc)

doc = etree.parse("test.xml")
schema.validate(doc)
pprint(schema.error_log)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]