Re: [xml] xmllint as minimal non-validating parser?



On Mon, Sep 17, 2007 at 09:24:42AM -0500, Chuck Bearden wrote:
On 9/17/07, Daniel Veillard <veillard redhat com> wrote:
On Thu, Sep 13, 2007 at 04:33:37PM -0500, Chuck Bearden wrote:
My reading of the XML Recommendation:
The well-formedness constraint "Entity Declared" [1] does not apply
to an XML document with an external DTD subset and which does not
have a standalone declaration of 'no', since on-validating processors
are not required to read external DTD subsets.  Such a document may
contain internal general entity references that aren't defined in an
internal DTD subset and nonetheless be well-formed.

The attached well-formed, valid document contains a reference to an
entity defined in the external DTD subset.  However, I can't find a
way to make xmllint treat it as well-formed:

  $ xmllint --noout --noent Briantest.xml
  Briantest.xml:20: parser error : Entity 'plus' not defined
      <m:mo>&plus;</m:mo>
                  ^
  $ xmllint --noout Briantest.xml
  Briantest.xml:20: parser error : Entity 'plus' not defined
      <m:mo>&plus;</m:mo>
                  ^
  $

Is there a way to make xmllint do no more than check documents
against the well-formedness constraints, to emulate a minimal
non-validating processor?

  You ask for --noent , hence requesting entity substitution, hence
loading the DTD. The default behaviour without --noent will do the
default behaviour you are requesting.

I asked for --noent in the first example above, but not in the second.
 As noted, 'xmllint --noout <filename>' without --noent gives the same
behavior.  I probably should have made the two examples more distinct
visually.

Is the above behavior (without --noent) a bug?  If so, I'll gladly
file a bug report.

  Not a bug. What you get is an error as reported. Remove the --noout and
you will see you actually get the XML data up to the end. See 
  http://www.w3.org/TR/REC-xml/#sec-terminology

and notice the difference between error and fatal error. A Well-formedness
error is a fatal error. A missing entity definition if the external subset
is not fetched is an error, not a fatal error, of course libxml2 reports it
as it MAY do so and it's important to the application layer to know some
informations are missing and the flow may be incomplete (some application
may want to abort processing in such case).

If --loaddtd option is given to libxml2 the entity is found and no error
is reported as it should.

  So all seems fine from an libxml2 POV, I don't see any bug there.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]