Re: [xml] xmlReadFile/xmlReadMemory - Performance or Concurrency problem



Daniel Veillard wrote:
On Tue, Nov 04, 2008 at 10:14:48AM +0100, Martin Trappel wrote:
Hi there.

I could use a few wild guesses, because I've quite run out of them:
* libxml2 2.6.27
* Windows XPsp2

I have a process that is running approx 30 threads where one of this threads is doing some calculations and network communication with a hardware device. This one is the only thread that spends a measureable time doing anything, i.e. it takes about 10% cpu).
So far, no libxml2 involved.

Now, in an additional thread, I start up a libxml parser to parse a 4MB xml file. (When tested in isolation parsing of this file takes approx 200-300ms). In this process, the parsing (xmlReadFile, or xmlReadMemory call with file read into memory) takes btw. 2 sec and 12 sec. That ain't the problem and of course I expected it to take longer due to heavy load. The problem now is, that the xmlRead* call takes 99% cpu resources which causes the other thread to slow down so much that it fails due to a fixed timeout for msg processing we have.

What is really interesting now is, that when I add some artificial cpu-load before or after xmlReadFile (some dummy calculations in a loop for 10 seconds) that takes up 99% cpu as well, but the msg processing in the other thread ain't aborted.

Could this be due to many heap-allocations from xmlReadFile/xmlReadMemory? Some other process global resource that could be the cause?

any guesses welcome!

  No idea. the Windows memory allocator gave us serious problems in the
past in face of realloc() use. Thread concurency may be a problem too.


As for realloc: As far as I could see no realloc calls are done during build-up of a tree (with xmlNewDoc, xmlNewDocNode, xmlAddChild, xmlAddProp, ...) and I also had these problems if my test-thread-code did just that.


I have finally found a solution for our performance problem.
It turned out that the problem was really rooted in Heap concurrency problems (in what manner exactly, I don't really dare say.)

I ended up giving libxml2 its own win32 Heap (HeapCreate(..)) via xmlMemSetup+xmlMemGcSetup plus simple fwder functions to the Heap*(..) WIN32 functions and now everything runs just fine.

To put the whole thing in context:
* With it's own heap, the parsing of a 6MB XML File takes <500ms in this specific environment. * As it was, with only the process heap for everything, it took 2-5 seconds on first run and then increased up to 11 seconds for subsequent runs of the same parsing code (xmlReadFile) in this process. * We're running heavily multithreaded where a few threads use significant (10% or more) processor time. * I was utterly unable to reproduce the behavior outside our application even with a few dummy threads thrown in.

Note that I now use a low fragmentation heap for libxml2 and that gives a slight advantage. (I see 5%+ faster parsing/tree building in a stand alone test app.)

So to sum up: There seem to be certain situations (under win32) where giving libxml2 a separate heap has significant performance advantage.

cheers,
Martin



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]