Re: [xml] Parsing big xml data received by chunks from libcurl



On Tue, Dec 01, 2015 at 05:11:52PM +0000, David Boucher wrote:
Hi the list,

I use libcurl to get a big chunk of xml data.

In the CURLOPT_WRITEFUNCTION call back, I have a piece of memory with xml
data.

The first time this callback is executed, we call xmlReaderNewMemory().
Then we call xmlTextReaderRead() while the result is 1.

The XML being splitted, the loop finishes to fail because it needs
following datas...

Thanks to xmlTextReaderByteConsumed, we are able to get data already read
and then the piece of data not read.

The next time the callback is called, we are able to build a new buffer
containing :
* datas not already read from the previous call
* new data from the new call.

My problem is here. I'm looking for a function that could change the buffer
to read to continue to parse xml data. I have tried xmlReaderNewMemory(),
but it fails...

Maybe a such function does not exist, and maybe this idea to read different
buffers of a same xml is a bad idea.

Is there a better way ? What are your advices ?

  Err you want to use the push parser when you don't have all data available
at parser creation time.

  Get the xmllint.c program look at the code in parseAndPrintFile
which handle the push testing case, it does something  like

                res = fread(chars, 1, 4, f);
                if (res > 0) {
                    ctxt = xmlCreatePushParserCtxt(NULL, NULL,
                                chars, res, filename);
                    xmlCtxtUseOptions(ctxt, options);
                    while ((res = fread(chars, 1, size, f)) > 0) {
                        xmlParseChunk(ctxt, chars, res, 0);
                    }
                    xmlParseChunk(ctxt, chars, 0, 1);
                    doc = ctxt->myDoc;
                    ret = ctxt->wellFormed;
                    xmlFreeParserCtxt(ctxt);
                    if (!ret) {
                        xmlFreeDoc(doc);
                        doc = NULL;

  You create the parser context with first 4 bytes of your stream,
define which options you want to use, and then xmlParseChunk( ... 0)
for each part until you reach the end where you do xmlParseChunk( ... 1)

Daniel

Thanks a lot.
Regards.
David.

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
https://mail.gnome.org/mailman/listinfo/xml


-- 
Daniel Veillard      | Open Source and Standards, Red Hat
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]