Re: [xml] loading concatenated documents



On Mon, Mar 29, 2010 at 10:25:30AM -0400, Ethan Tira-Thompson wrote:
Hi Daniel, thanks for the feedback.

[1]     document       ::=       prolog  element  Misc*
...
 NEVER STACK XML DOCUMENTS

This is an unfortunate design decision.  I'm not going to close and reopen a network connection for each of 
a series of short and frequently-sent XML documents just so the parser can verify the eof, it's pedantic.  
The parser knows the document (or at least the root node) has ended, and by default it makes sense to 
complain if there's extra characters afterward, but there should be a way to tell libxml to ignore it.  I'm 
not trying to claim my entire stream is a valid XML document, I only claim each of the documents in the 
stream is valid, and it would be nice to have better support for the situation.

The direct workaround, to make the IO read callback duplicate the parsing functionality of looking for the 
close tag for the root node, is error prone.  libxml is already doing this, I shouldn't have to reimplement 
this functionality myself.

If nothing else, since you already have the save-as-fragment functionality, it's odd you don't also have 
the load-as-fragment... this situation also arises if you know you have some XML embedded in something else 
(maybe more XML, maybe not) and you just want to parse just that chunk for efficiency.

There are function to parse well balanced fragments:

  http://xmlsoft.org/html/libxml-parser.html#xmlParseBalancedChunkMemory

  or 

  http://xmlsoft.org/html/libxml-parser.html#xmlParseInNodeContext

But you have to provide the boundaries.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]