Re: [xml] xml reader interface and sockets

On Fri, Jan 10, 2003 at 06:04:26PM -0500, Sean Middleditch wrote:


first I suggest you join the list, what you're asking raise discussion
and I don't want to have to amnually approve your posts to be able to

 From what I've read of the new xml reader interface, it's only lets you
pull in the parsed data.  It doesn't let you push in the raw xml.  I'm

  Right that's the current limited interface it offers.

working on a project that involves reading large amounts of XML from a
socket, and I thought it might be a lot more efficient to be able to
hand chunks to the parser.  Something like:

  While (Data Available)
    Push XML Chunk
    While (Nodes to Parse)
      Handle Node
  Close XML Input
  Handle Remaining Nodes

  This is actually similar to the internal working of the XmlTextReader
interface. The document I wrote only detailed the API not the internals.
Internally it's very close to what you're looking for:
 1/ there is an input buffer
 2/ that input buffer is fed progressively with data from the file or
    the I/O Input
 3/ as Read() calls are done, blocks of data are extracted from the
    input buffer and pushed to a progressive parser interface generating
    a tree
 4/ as the tree is processed, old nodes are discarded.

I'm current using Expat and it's sax-ish interface; however, that's
turning out to be complete hell (since I need to parse the data into a
tree form, but one that doesn't mirror the XML format, so I have tons
and tons of state to track and such; doesn't help that Expat doesn't
validate or allow error codes in the callbacks...)

  SAX has advantages, but in general it's too complex.

Anyways, the point of this babbling, will it be possible to feed chunks
to the XML Reader interface?

  That could relatively easilly be extended. BUT (of course there is a 
negative side) it makes the interface harder. The current interface is simple
because it's a completely consumer controlled processing model. The SAX
model is completely producer controlled. The existing interface is doing
a full conversion of control from the producer to the consumer, but will
block on I/O input. One of the key point of the current interface is that
it should not use too much data locally, if processing is too slow and fed
from a socket, the transfer rate will get lower but the processing point
won't buffer data.

If not, does libxml2 offer another
interface besides SAX that might fit?

  Basically there is already a progressive I/O interface:
     xmlParseChunk(xmlParserCtxtPtr ctxt, const char *chunk, int size,
                   int terminate);
which is the core of the XmlTextReader process. I suggest you look at
xmlTextReaderPushData() and xmlTextReaderRead() in xmlreader.c which
are the core routine controlling the XmlTextReader. Based on this
you may be able to suggest extension to the existing interface:
    - is there nodes/input data to process ?
    - push more available data
    - complete the parsing of a given subtree

As I said the current interface should allows really interesting extensions
to the processing model. Mixing Push and Pull, data controlled but tree
based APIs should be doable too.


Daniel Veillard      | Red Hat Network
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]