Re: [xml] loading concatenated documents



On Mon, Mar 29, 2010 at 10:25:30AM -0400, Ethan Tira-Thompson wrote:
Hi Daniel, thanks for the feedback.

[1]     document       ::=       prolog  element  Misc*
...
 NEVER STACK XML DOCUMENTS

This is an unfortunate design decision.  I'm not going to close
and reopen a network connection for each of a series of short and
frequently-sent XML documents just so the parser can verify the eof,

  it's always a bit problematic to assume one is the first one to
ever design a protocol. I assume you're heard of XMPP aka Jabber
they solved this 10 years ago. Send everything as 1 document,
chunk by chunk, and close the top element when closing the connection.

it's pedantic.  The parser knows the document (or at least the root
node) has ended,

  it knows the top element has ended but there are various things
which may be pushed there after like comments or PIs

and by default it makes sense to complain if there's
extra characters afterward, but there should be a way to tell libxml
to ignore it.  I'm not trying to claim my entire stream is a valid XML
document,

  Stop using the wrong term, in XML context valid means that it passes
DTD validation. You mean well-formed here ...

I only claim each of the documents in the stream is valid,
and it would be nice to have better support for the situation.

  Since you're in research, I would suggest you read the 2 specification
governing the 
    XML-1.0 spec http://www.w3.org/TR/REC-xml/
    XMPP  http://tools.ietf.org/html/rfc3920 especially section 4

The direct workaround, to make the IO read callback duplicate the
parsing functionality of looking for the close tag for the root node,
is error prone.  libxml is already doing this, I shouldn't have to
reimplement this functionality myself.

  that's wrong that would mean mixing layers. Either carry the length
of each document as part of your protocol, or provide a marker which is
not a compatible XML char, or do it the Jabber way. But you will have
to tell the parser where the XML document(s) end.

If nothing else, since you already have the save-as-fragment
functionality, it's odd you don't also have the load-as-fragment... this
situation also arises if you know you have some XML embedded in something
else (maybe more XML, maybe not) and you just want to parse just that
chunk for efficiency.

  See other mail, libxml2 does provide such routines.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]