Re: [xml] xmlTextReader and parseChunk



On Fri, Dec 14, 2012 at 11:22:37AM +0100, Alexandre Bique wrote:
Hi,

I would like to know if it is possible to use xmlTextReader, but with
a parseChunk interface?

  Well the two are somehow in opposition:
   - the reader will internally try to get more data while parsing
     assuming a synchronous input
   - the chunk interface assumes the parser will just stop and
     give back the execution control to the caller once it needs
     more data.

Actually I do:

// I removed the checks to simplify the code
buffer = xmlAllocParserInputBuffer(XML_CHAR_ENCODING_NONE);
reader = xmlNewTextReader(buffer, url);

void data_received(const char *data, size_t len)
{
        xmlParserInputBufferPush(buffer, len, data);
        while (xmlTextReaderRead(reader) == 1)

   xmlTextReaderRead() may return 0 or and error code here
if there is not enough data to finish parsing

              parse_node(reader);
}

This works but I noticed that the last chunk may not be parsed.
How can I make the reader to consume all the remaining data?

  Honnestly I don't know how to solve that simply. The natural way
to do this would be to parse in a separate thread, create a
reader for custom I/O and have the I/O read routine block if there
is no more data to be read, then the main thread would unblock it
when new data is available. This requires specialized I/O routines,
threading and synchronization, so not simple.

  The core problem is that xmlTextReaderRead() can either return
1 for success, 0 if parsing is finished and -1 in case of error.
There is no provision in the API to say "I need more data", and
basically missing data would be reported as a parsing error
(with missing closed tags for example).

  The programming model of the reader is way simpler, but it assumes
a synchronous input.

Daniel

-- 
Daniel Veillard      | Open Source and Standards, Red Hat
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]