Re: [xml] xmllint reports non-determinist content model for schema



On Thu, Aug 23, 2012 at 3:03 PM, Daniel Veillard <veillard redhat com> wrote:
On Wed, Aug 22, 2012 at 10:40:20PM +0200, Johan Corveleyn wrote:
On Fri, Aug 17, 2012 at 10:32 PM, Johan Corveleyn <jcorvel gmail com> wrote:
On Fri, Feb 24, 2012 at 2:14 PM, Johan Corveleyn <jcorvel gmail com> wrote:
...
I'm running into a "non-determinist" error with a schema even though I
don't think it's non-determinist (and both XSV and Xerces agree with
me; they too have no problem with the schema). I have reduced my
test-case to a simple example, see below.

On Tue, Feb 28, 2012 at 6:00 AM, Daniel Veillard <veillard redhat com> wrote:
...
 Well in that case it seems the underlying code is generating a
wrong automata so it should be a matter of running that minimal
test case under gdb with the breakpoint appropriately set to find out
what wrong transition got added ...

I'd like to take a stab at this issue. I've been able to get the
latest 2.9.0 snapshot (from
ftp://xmlsoft.org/libxml2/libxml2-git-snapshot.tar.gz) to build in
Visual Studio 2010. I can still reproduce the issue.

But I need some guidance. What part of the code should I be looking
at? Where do I have to set the "breakpoint appropriately to find out
the wrong transition that got added"?

Alternatively, if you or anyone else on this list can fix the issue
already, that would be even better :-).

To summarize, the issue is this:
Given the following schema:

        <?xml version="1.0" encoding="UTF-8"?>
        <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";>
                <xs:element name="rules">
                        <xs:complexType>
                                <xs:sequence>
                                        <xs:element name="rule" minOccurs="0" maxOccurs="unbounded"/>
                                        <xs:sequence minOccurs="0" maxOccurs="1">
                                                <xs:element name="specialRule" minOccurs="1" 
maxOccurs="1"/>
                                                <xs:element name="rule" minOccurs="0" 
maxOccurs="unbounded"/>
                                        </xs:sequence>
                                </xs:sequence>
                        </xs:complexType>
                </xs:element>
        </xs:schema>

xmllint complains with "Schemas parser error : local
complex type: The content model is not determinist."
But it isn't.

See also https://bugzilla.gnome.org/show_bug.cgi?id=670865.

Hi,

I could really use some guidance here. I've been able to set up
everything in Visual Studio 2010, and reproduced the issue from within
the debugger. So I can set a breakpoint etc. But I mostly lack some
basic knowledge about how this all works.

Is there some (preferable concise) reading about basic concepts like
how xml schemas automata work, how these are built, etc? What should
the automata in this case look like (in terms of the data structures
inside the code)? How can I see that the automata is non-determinist,
and what would a determinist automata look like, ... ?

    Johan,

  sorry for the slow answer,

I will look at this, promise, but all that code is rooted in automata
theory, there is quite some litterature about it, and well unless you
go though the process of learning that code and possibly the theory
behind it, it's not something that can be hacked trivially. Basically
I run this in debug mode, look at the automata generated, there is pro
bably an error there and then this need fixing and a lot of checking.
To be honnest that one of the hardest part of the library to change
with the core of the parser.
  Theory is usually learnt though the 'Dragon Book', by Aho and Al.
and the code is actually more complex since using extended automata
operations not found in basic automata ...

    If you have time looking at this may be interesting, but
it's really not trivial at least more complex than other parts of the
code :-)

Thanks for your answer, and for pointing out that it's not so simple
:-). It seems that this is a bit over my head then (I have little clue
about automata theory, so it would probably take me weeks or months to
understand what's going on ...). I hope you can find some time to dig
in further in the not-too-distant future :-). I'm always willing to
help out, test, ... if that would be useful (though my usefulness
might be limited because of my non-understanding of things ...).

Thanks for all your efforts,
-- 
Johan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]