Re: [xml] DTD validation issue

On Mon, Feb 25, 2008 at 09:45:13PM +0100, Petr Pajas wrote:
Hi Daniel, All, 

the following inconsistency in DTD validation, reproducible with xmllint, was 
reported to me by a user of XSH2, Jakub Neburka.

He takes two files: decl.dtd and decl.xml and does basically the following:

1) xmllint --valid decl.xml
   xmllint --postvalid decl.xml

both succeed.

2) xmllint --shell decl.xml
/> validate

this, however, fails with

decl.xml:5: element root: validity error : Element root was declared EMPTY 
this one has content

(Probably because the library calls are alike, XSH2 behaves similarly: 
parse-time validation is fine, validating the in-memory tree fails).

The test cases follow.

<!ENTITY % cond "IGNORE">
<!ENTITY % content "ANY">
<!ENTITY % content "EMPTY">
<!ELEMENT root %content;>

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root SYSTEM "decl.dtd" [
<!ENTITY % cond "INCLUDE">

Can you confirm this is a bug? Shall I bugzilla it?

  Not a bug. When you do things like Post validation, you give it a
preparsed DTD. in that case the DTD was parsed without the context
of the document, while the internal subset changes the behaviour.
Basically xmlValidateDtd() or any validation using a DTD parsed out
of the context of the document can't exactly match the behaviour of
XML-1.0 validation, because it allows the document to modify the
  Actually having a validation which depends only on the DTD/schemas
and where the document can't modify the set of rules set by the receiver
is in a lot of cases a good thing, if you consider a DTD/Schemas is
a contract between a producer and a consumer of documents.
  If you want to have 100% the DTD validation semantic as described in
XML-1.0 spec, reparsing the document is I think the only guaranteed
correct option.
  Also note that the mismatch is documented in libxml2 call
 * xmlValidateDtd:
 * @ctxt:  the validation context
 * @doc:  a document instance
 * @dtd:  a dtd instance
 * Try to validate the document against the dtd instance
 * Basically it does check all the definitions in the DtD.
 * Note the the internal subset (if present) is de-coupled
 * (i.e. not used), which could give problems if ID or IDREF
 * is present.
 * returns 1 if valid or 0 otherwise


Red Hat Virtualization group
Daniel Veillard      | virtualization library
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]