Re: [xml] loading concatenated documents

On Mon, Mar 29, 2010 at 09:21:56PM -0400, Ethan Tira-Thompson wrote:
Thanks for all the information, I'll try to collate things :)
Failure to do so would just make the parser non-conformant to the XML-1.0 specification.

Are you sure about this?  Like I said, I'm not aware of the
specification that it must be an error if more data follows the document.
The spec does defines this extra data is not part of the document,
but AFAIK not what you should do with/about it.  It would better serve
interoperability to simply ignore it and let the user decide if it's an
issue, probably issuing a warning by default.  But I'm no expert on the
spec, it would be educational if you could point me to the section.

  You get things backward, read the spec:

"Each XML document has both a logical and a physical structure. 
Physically, the document is composed of units called entities. An
entity may refer to other entities to cause their inclusion in the

An entity is basically a file. In your case there is only one entity
as you are not loading any external entity.

Now comes the definition of Well-Formed XML Documents

[Definition: A textual object is a well-formed XML document if:] 

1. Taken as a whole, it matches the production labeled document.

2. It meets all the well-formedness constraints given in
this specification.

3. Each of the parsed entities which is
referenced directly or indirectly within the
document is well-formed.

[1]     document       ::=       prolog  element  Misc*

so the definition is based on

 you give a textual object and the processor tells you whether it's
 well formed.

In that case you feed the entity content, and the processor will parse
it. If it find a second root element you get a fatal error and the
*whole* is a not well formed document.

You can make all the theories about how the processor could just ignore
thinsg or stop at a given point, it's just not how the spec says an
XML processor must be implemented. You will note the

  "taken as a whole"

clearly indicating it's absolutely forbidden to stop applying the rules
at some point.
You feed the XML parser what the entiti(es) contains and it provides a
result back. If there is an error in the middle or the end, it
invalidates the whole document.


Daniel Veillard      | libxml Gnome XML XSLT toolkit
daniel veillard com  | Rpmfind RPM search engine | virtualization library

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]