Re: [xml] Performance gets bad when parsing xml with namespaces



On Fri, Sep 24, 2010 at 01:47:15AM +0200, Max Kisselew wrote:
[...]
I wanted to extract all the content from the <token> elements. In the xml
file without the namespace definitions that takes just a moment (less
that 30 seconds).
But when I tried to perform the same on the new file with namespaces, it
took much longer, more that 30 minutes (!). The xml file was about 7 MB.

Since the same problem occurs when one tries to parse the xml file
with the LibXML2 binding for Perl, I guess the problem comes from
LibXML2 itself.

It is also strange that the performance problem seems to grow with the
amount of the <token>
tags to be parsed. So the first 10 000 tags only need about a second.
But when we parse the
first 20 000 tags, it takes 21 seconds! Do you have any idea about the
cause of this problem
and how it could be solved?

please try first with libxml2 directly, please make sure you have a
recent version
xmllint --noout your.xml
xmllint --version

second make sure that you're not exhausting your available memory, if
the system begin to swap, there is no way performances are gonna be
linear, 7MB is unlikely to result in swap, but ...

if xmllint --noout takes really too long, then I will investigate
provide me a gzipped version of the file on some server to have a look.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]