Re: [xml] DTD validation issue
- From: Daniel Veillard <veillard redhat com>
- To: Petr Pajas <pajas ufal ms mff cuni cz>
- Cc: libxml2 <xml gnome org>
- Subject: Re: [xml] DTD validation issue
- Date: Tue, 26 Feb 2008 04:15:28 -0500
On Mon, Feb 25, 2008 at 09:45:13PM +0100, Petr Pajas wrote:
Hi Daniel, All,
the following inconsistency in DTD validation, reproducible with xmllint, was
reported to me by a user of XSH2, Jakub Neburka.
He takes two files: decl.dtd and decl.xml and does basically the following:
1) xmllint --valid decl.xml
xmllint --postvalid decl.xml
both succeed.
2) xmllint --shell decl.xml
/> validate
this, however, fails with
decl.xml:5: element root: validity error : Element root was declared EMPTY
this one has content
(Probably because the library calls are alike, XSH2 behaves similarly:
parse-time validation is fine, validating the in-memory tree fails).
The test cases follow.
__decl.dtd__
<!ENTITY % cond "IGNORE">
<![%cond;[
<!ENTITY % content "ANY">
]]>
<!ENTITY % content "EMPTY">
<!ELEMENT root %content;>
__CUT__
__decl.xml__
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root SYSTEM "decl.dtd" [
<!ENTITY % cond "INCLUDE">
]>
<root>content</root>
__CUT__
Can you confirm this is a bug? Shall I bugzilla it?
Not a bug. When you do things like Post validation, you give it a
preparsed DTD. in that case the DTD was parsed without the context
of the document, while the internal subset changes the behaviour.
Basically xmlValidateDtd() or any validation using a DTD parsed out
of the context of the document can't exactly match the behaviour of
XML-1.0 validation, because it allows the document to modify the
DTD.
Actually having a validation which depends only on the DTD/schemas
and where the document can't modify the set of rules set by the receiver
is in a lot of cases a good thing, if you consider a DTD/Schemas is
a contract between a producer and a consumer of documents.
If you want to have 100% the DTD validation semantic as described in
XML-1.0 spec, reparsing the document is I think the only guaranteed
correct option.
Also note that the mismatch is documented in libxml2 call
/**
* xmlValidateDtd:
* @ctxt: the validation context
* @doc: a document instance
* @dtd: a dtd instance
*
* Try to validate the document against the dtd instance
*
* Basically it does check all the definitions in the DtD.
* Note the the internal subset (if present) is de-coupled
* (i.e. not used), which could give problems if ID or IDREF
* is present.
*
* returns 1 if valid or 0 otherwise
*/
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]