Re: [xml] loading concatenated documents
- From: Daniel Veillard <veillard redhat com>
- To: Ethan Tira-Thompson <ejt cmu edu>
- Cc: xml gnome org
- Subject: Re: [xml] loading concatenated documents
- Date: Mon, 29 Mar 2010 17:43:15 +0200
On Mon, Mar 29, 2010 at 10:25:30AM -0400, Ethan Tira-Thompson wrote:
Hi Daniel, thanks for the feedback.
[1] document ::= prolog element Misc*
...
NEVER STACK XML DOCUMENTS
This is an unfortunate design decision. I'm not going to close and reopen a network connection for each of
a series of short and frequently-sent XML documents just so the parser can verify the eof, it's pedantic.
The parser knows the document (or at least the root node) has ended, and by default it makes sense to
complain if there's extra characters afterward, but there should be a way to tell libxml to ignore it. I'm
not trying to claim my entire stream is a valid XML document, I only claim each of the documents in the
stream is valid, and it would be nice to have better support for the situation.
The direct workaround, to make the IO read callback duplicate the parsing functionality of looking for the
close tag for the root node, is error prone. libxml is already doing this, I shouldn't have to reimplement
this functionality myself.
If nothing else, since you already have the save-as-fragment functionality, it's odd you don't also have
the load-as-fragment... this situation also arises if you know you have some XML embedded in something else
(maybe more XML, maybe not) and you just want to parse just that chunk for efficiency.
There are function to parse well balanced fragments:
http://xmlsoft.org/html/libxml-parser.html#xmlParseBalancedChunkMemory
or
http://xmlsoft.org/html/libxml-parser.html#xmlParseInNodeContext
But you have to provide the boundaries.
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]