Re: [xml] Performance xmlXPathEvalExpression




Shirazi Babak wrote:
I have a machine with 1 GB memory. And there are customers with machine
with 512 KB memory. So I could not change the machines to run a special
parser. The better way, I think, is to change the parser. (Less expensive
way). So if nobody has an idea to solve the problem, I think I test the
problem with other parsers like MSXML 4.0.

Well, the thing is, you have to analyse your problem in order to solve it. You
started off telling the list that you have an XPath performance problem. But a
simple check of your CPU usage made it clear that XPath has nothing to do with it.

I can only say that your application is not CPU bound. So the most likely
assumption is that it is I/O bound, which for an I/O free XPath application
usually means that it is swapping. You will have to verify that on your side
using the appropriate tools (listening to your hard drive might be a good
first step). A memory profiler might already tell you if it's really the
in-memory tree or if there are other parts in your application that take away
too much memory for themselves.

If you really find that it is a memory issue due to libxml2, you can try to
reduce the memory footprint (e.g. with the NOBLANKS and COMPACT parser option
(you should really try those two for SOAP files), by using a shared dict for
all parsers, by making sure you do not have any memory leaks, etc.), or you
can try to use a different tool. libxml2 is plenty fast for both parsing and
XPath, so you shouldn't consider the last step without really good reason.

On my machine, for example, I can parse a 3MB XML file into a Python test
application (using lxml.etree) and run a simple complete-scan XPath expression
on it in about 0,18 seconds (including interpreter start-up time), using a
peak of 18 MB RAM for the whole interpreter. Using NOBLANK gets me down by
another 2MB. If your XML file is, say, four times as big, I wouldn't expect
the memory usage to go any higher than 80MB - obviously depending on the
structure. So I can't quite see how libxml2 should become the bottleneck in
your app. But since your app is not CPU bound, you really need to profile it
to see where the actual problem is.

Stefan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]