Re: [xml] Performance xmlXPathEvalExpression

Thank you for you comments. I will check my code in order to find another parameters which would make this 
high processing time. One question: How could I set the NOBLANKS and COMPACT options.


-----Ursprüngliche Nachricht-----
Von: Stefan Behnel [mailto:stefan_ml behnel de] 
Gesendet: Dienstag, 14. August 2007 08:41
An: Shirazi Babak
Cc: xml gnome org
Betreff: Re: [xml] Performance xmlXPathEvalExpression

Shirazi Babak wrote:
I have a machine with 1 GB memory. And there are customers with 
machine with 512 KB memory. So I could not change the machines to run 
a special parser. The better way, I think, is to change the parser. 
(Less expensive way). So if nobody has an idea to solve the problem, I 
think I test the problem with other parsers like MSXML 4.0.

Well, the thing is, you have to analyse your problem in order to solve it. You started off telling the list 
that you have an XPath performance problem. But a simple check of your CPU usage made it clear that XPath has 
nothing to do with it.

I can only say that your application is not CPU bound. So the most likely assumption is that it is I/O bound, 
which for an I/O free XPath application usually means that it is swapping. You will have to verify that on 
your side using the appropriate tools (listening to your hard drive might be a good first step). A memory 
profiler might already tell you if it's really the in-memory tree or if there are other parts in your 
application that take away too much memory for themselves.

If you really find that it is a memory issue due to libxml2, you can try to reduce the memory footprint (e.g. 
with the NOBLANKS and COMPACT parser option (you should really try those two for SOAP files), by using a 
shared dict for all parsers, by making sure you do not have any memory leaks, etc.), or you can try to use a 
different tool. libxml2 is plenty fast for both parsing and XPath, so you shouldn't consider the last step 
without really good reason.

On my machine, for example, I can parse a 3MB XML file into a Python test application (using lxml.etree) and 
run a simple complete-scan XPath expression on it in about 0,18 seconds (including interpreter start-up 
time), using a peak of 18 MB RAM for the whole interpreter. Using NOBLANK gets me down by another 2MB. If 
your XML file is, say, four times as big, I wouldn't expect the memory usage to go any higher than 80MB - 
obviously depending on the structure. So I can't quite see how libxml2 should become the bottleneck in your 
app. But since your app is not CPU bound, you really need to profile it to see where the actual problem is.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]