Re: [xml] DTD validation issue



On Mon, Feb 25, 2008 at 09:45:13PM +0100, Petr Pajas wrote:
Hi Daniel, All, 

the following inconsistency in DTD validation, reproducible with xmllint, was 
reported to me by a user of XSH2, Jakub Neburka.

He takes two files: decl.dtd and decl.xml and does basically the following:

1) xmllint --valid decl.xml
   xmllint --postvalid decl.xml

both succeed.

2) xmllint --shell decl.xml
/> validate

this, however, fails with

decl.xml:5: element root: validity error : Element root was declared EMPTY 
this one has content

(Probably because the library calls are alike, XSH2 behaves similarly: 
parse-time validation is fine, validating the in-memory tree fails).

The test cases follow.

__decl.dtd__
<!ENTITY % cond "IGNORE">
<![%cond;[
<!ENTITY % content "ANY">
]]>
<!ENTITY % content "EMPTY">
<!ELEMENT root %content;>
__CUT__

__decl.xml__
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root SYSTEM "decl.dtd" [
<!ENTITY % cond "INCLUDE">
]>
<root>content</root>
__CUT__

Can you confirm this is a bug? Shall I bugzilla it?

  Not a bug. When you do things like Post validation, you give it a
preparsed DTD. in that case the DTD was parsed without the context
of the document, while the internal subset changes the behaviour.
Basically xmlValidateDtd() or any validation using a DTD parsed out
of the context of the document can't exactly match the behaviour of
XML-1.0 validation, because it allows the document to modify the
DTD.
  Actually having a validation which depends only on the DTD/schemas
and where the document can't modify the set of rules set by the receiver
is in a lot of cases a good thing, if you consider a DTD/Schemas is
a contract between a producer and a consumer of documents.
  If you want to have 100% the DTD validation semantic as described in
XML-1.0 spec, reparsing the document is I think the only guaranteed
correct option.
  Also note that the mismatch is documented in libxml2 call
/**
 * xmlValidateDtd:
 * @ctxt:  the validation context
 * @doc:  a document instance
 * @dtd:  a dtd instance
 *
 * Try to validate the document against the dtd instance
 *
 * Basically it does check all the definitions in the DtD.
 * Note the the internal subset (if present) is de-coupled
 * (i.e. not used), which could give problems if ID or IDREF
 * is present.
 *
 * returns 1 if valid or 0 otherwise
 */

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]