Re: [xml] xmlreader error detection



On Sun, Nov 26, 2006 at 09:33:09AM -0500, Elliotte Harold wrote:
What happens when libxml, invoked via xmlreader (itself invoked via 
PHP's XmlReader) detects a well-formedness error? How is the error 
reported to the client application?

  Either with the global default or with 
    http://xmlsoft.org/html/libxml-xmlreader.html#xmlTextReaderSetErrorHandler

In my experiments it seems that the read method merely returns false.

  no libxml2 always raise an error

If 
that's true, is there a way to distinguish between this case and the 
simple end of the document?

  The reader end of document should not be a -1 return, but 0

A related question: Theoretically, the parser could report data up to 
the first error it finds. In my experiments with small documents, 
however, it actually errors out immediately.

  I think a lot of what you are seeing is specific to PHP for which
unfortunately I can't comment.

I suspect the underlying 
parser is preparsing a large chunk of the document, caching it, and then 
doling it out a piece at a time. Thus it tends to detect errors 
prematurely. Is this accurate?

  That's how libxml2 operates underneath.

If so, is there a limit to how much it will preparse? I assume it's not 
loading the whole document into a DOM first, and then iterating through 
that.

  No unless you ask for it. The amount buffered depends on a number of factors
mostly the document, and poitentailly other things like RNG validation.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]