[xml] bug in xmllint parsing of content model



I have tried searching the FAQ, list archives, and list of open bugs,
and have not found this one, to my surprise. I don't think it's
anything I'm doing wrong, but I could be wrong (about that :-).

xmllint seems to inappropriately remove parentheses from within
content models at times, thus creating problems. The following input
file is valid according to nsgmls, xmlparse, and rxp; a friend also
checked a more complicated version of the same problem in XMetal, and
reported no errors. xmllint generates an error (complete output
attached later):

| validity error: Content model of box is not determinist: ((box ,
|                 filler*)+ | ((gift , filler*)+ , (box , filler*)*)) 

(whitespace added). The content model shown is, as far as I can tell,
deterministic (or "non-ambiguous" for the SGMLers). However, looking
at the output of xmllint, it seems that the content model it is
barfing on is not the same one as is in the error message (which is
the same as the one in the input file, except for whitespace).

Output:
| <!ELEMENT box ((box , filler*)+ | ((gift , filler*)+ , box , filler**))>

Compare that to the (whitespace-similarly-altered) input

Input (and error message):
| <!ELEMENT box ((box , filler*)+ | ((gift , filler*)+ ,(box , filler*)*))>

and you can see that the parentheses around the last name token group
have disappeared. The resulting content model (without parentheses)
is, I believe, in error, but I think it's actually still
deterministic (or would be w/o the double asterisk, at least).

Full input file (for those interested in the choice of element names,
I've noticed that boxes of, say, valentine's gifts are often mostly
filler, and there are often multiple layers of nested packaging to
get through before you get to the goodies) is first, followed by the
copied-and-pasted command-line and resulting output, including
xmllint version information.

[ATTACHMENT ~/temp/test_xmllint.xml, text/xml]
[ATTACHMENT ~/temp/shell-output, text/plain]



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]