Re: [xml] libxml2 very slow on big data dump



Alexandre Macard a écrit :
Alexandre Macard a écrit :
  
Stefan Behnel a écrit :
  
    
Alexandre Macard wrote:
  
    
      
Stefan Behnel a écrit :
    
      
        
Alexandre Macard wrote:
      
        
          
I try dump a node from a big xml (near 7mo), and the libxml2 is very
slow to respond.

I tried to trace the problem and it seems to take all it's time into
the
function: xmlOutputBufferWriteEscape.
I do not need to escape data because I use a base64 encoding.

        
          
            
You didn't write which version of libxml2 you are using, but there was a
bug in an older version that could lead to horrible performance when
serialising character entities.

Try upgrading your library.
      
        
          
Sorry I forgot to precise this information. I am using the last version
2.7.2.
    
      
        
So maybe it's a similar bug, but for a different encoding (I think it was
related to the ASCII encoding at the time).

Could you provide the code snippet that you use for serialisation? I.e.
what parameters you pass into what function?

Stefan


  
    
      
This little test code make 15secs to exit.
The journal.xml size is 7.1Mo.

int main() {
    xmlDocPtr doc;
    xmlNodePtr cur;
    xmlBufferPtr buf;

    doc = xmlParseFile("./journal.xml");
   
    if (doc == NULL ) {
        fprintf(stderr,"Document not parsed successfully. \n");
        return (0);
    }
    cur = xmlDocGetRootElement(doc);

    if (cur == NULL) {
        fprintf(stderr,"empty document\n");
        xmlFreeDoc(doc);
        return (0);
    }

    buf = xmlBufferCreate();
   
    xmlNodeDump(buf, doc, cur, 1, 1);

    xmlFree(buf);
    xmlFreeDoc(doc);

    return (0);
}

I will try to add later a script to generate a similar xml.

Thanks.
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml

  
    
I forgot to precise that all the time is passed into function xmlNodeDump.

At the end you find a script that generate similar xml. I used this xml
to test and I had to wait 22secs for my program to exit.

usage: script.sh > journal.xml


#!/bin/bash

#Header
echo -n '<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/";
xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance";
xmlns:xsd="http://www.w3.org/1999/XMLSchema";
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/";>
<SOAP-ENV:Header/> <SOAP-ENV:Body>  <m:arkws_methodResponse
xmlns:m="urn:arkeia">'

echo -n '<m:list0 xsi:type="xsd:list"><m:last
xsi:type="xsd:integer">1</m:last><m:param0
xsi:type="xsd:integer">0</m:param0><m:base64_param1
xsi:type="xsd:string">MjAwOC8xMi8xNiAxNjo0NzoxMyBJMDAxMTAwMDAgMDFUUF9MSVNUX0FMTDogWW91IGhhdmUgc3VjY2Vzc2Z1bGx5IGxvYWRlZCB0aGUgbGlzdCBvZiB0YXBlcyE=</m:base64_param1><m:param2
xsi:type="xsd:list">'

i=0
while [ $i -lt 15000 ] ; do
    echo -n '<m:item xsi:type="xsd:list"><m:base64_RDATE
xsi:type="xsd:string">MTIzMDkxMDAyNQ==</m:base64_RDATE><m:base64_NUM
xsi:type="xsd:string">MDAwMDE=</m:base64_NUM><m:base64_OWNER
xsi:type="xsd:string">cm9vdA==</m:base64_OWNER><m:base64_THREAD
xsi:type="xsd:string">MDAx</m:base64_THREAD><m:base64_PLID
xsi:type="xsd:string">NDczODVhMWY=</m:base64_PLID><m:base64_CID
xsi:type="xsd:string">NDkzNjlmZjA=</m:base64_CID><m:base64_TPID
xsi:type="xsd:string">NDc1NThlZjM=</m:base64_TPID><m:base64_VOLTAG
xsi:type="xsd:string">L2JhY2t1cHMvZmlsZQ==</m:base64_VOLTAG><m:base64_NAME
xsi:type="xsd:string">dGFwZV9maWxl</m:base64_NAME></m:item>'
    i=`expr $i + 1`
done

echo -n '</m:param2></m:list0>'

#Footer
echo '</m:arkws_methodResponse> </SOAP-ENV:Body></SOAP-ENV:Envelope>'

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml

  
Hi,

I tried to add <!CDATA[something_encoded_into_base64]] inside of all my
string type. For the same XML, I got without CDATA 23secs to have a
reply. With CDATA it only take 1sec !

I really think that all my troubles are from escaping functions.

Thanks.
Regards.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]