Re: [xml] xmlDocDumpMemory() is VERY slow on Win32
- From: Daniel Veillard <veillard redhat com>
- To: Steve Hay <steve hay uk radan com>
- Cc: xml gnome org
- Subject: Re: [xml] xmlDocDumpMemory() is VERY slow on Win32
- Date: Tue, 13 Jul 2004 09:58:33 -0400
On Tue, Jul 13, 2004 at 02:48:11PM +0100, Steve Hay wrote:
[Daniel: I have spoken to our sysadmin guy here about removing, or,
more likely, changing that pesky email disclaimer. He needs to talk to
his boss about it, and hopefully something will change. In the
meantime, I hope you will bear with us... The disclaimer is certainly
not intended to mean that you can't archive/index/publish these messages
in the usual way.]
well the wording sounds it would prevent it. You're sending to the list
and people watching the archives or using the search facility may not be in
that group.
Here's the top of the "function timing" profile (just using the standard
ftime.bat batch file) from my test program modified to only do the
parse/dump once with the XML file having no encoding declaration:
Func Func+Child Hit
Time % Time % Count Function
---------------------------------------------------------
1046.482 37.6 1046.482 37.6 10200004 _xmlCurrentChar
(parserinternals.obj)
842.231 30.3 2051.063 73.7 1 _xmlParseCDSect (parser.obj)
523.994 18.8 2783.338 100.0 1 _main (foo.obj)
138.504 5.0 138.504 5.0 22 _xmlBufferResize (tree.obj)
83.636 3.0 83.636 3.0 16 _xmlStrndup (xmlstring.obj)
52.874 1.9 52.874 1.9 2602 _xmlFileRead (xmlio.obj)
44.236 1.6 117.350 4.2 650 _xmlBufferAdd (tree.obj)
37.194 1.3 162.210 5.8 2
_xmlNodeDumpOutputInternal (xmlsave.obj)
looks normal except _xmlCurrentChar is especially high, and _xmlBufferResize
should not be taht prevalent.
Func Func+Child Hit
Time % Time % Count Function
---------------------------------------------------------
14108.740 86.7 14108.740 86.7 637 _xmlBufferGrow (tree.obj)
788.004 4.8 1497.033 9.2 1 _xmlParseCDSect (parser.obj)
547.481 3.4 547.481 3.4 10200004 _xmlCurrentChar
(parserinternals.obj)
539.857 3.3 16278.202 100.0 1 _main (foo.obj)
(everything else was <1%).
Would this indicate that the bottleneck is in xmlBufferGrow()?
Sounds a bit like one of Daniel's suggestions in response to my original
bug report (http://bugzilla.gnome.org/show_bug.cgi?id=146697), namely:
"Other potential issues might be a defect in the buffer allocation
strategy (not doubling and hence copying all the time block is
reallocated)."
yeah buffer handling looks toast in your environment...
I therefore decided to turn on memory debugging (cscript configure.js
that's useless. You need to take teh debugger and see what's happening
in xmlBufferGrow(). It should not be called each time a packet is converted.
Anyway, I hacked xmlversion.h to #define DEBUG_MEMORY as well as
DEBUG_MEMORY_LOCATION. Then when I rebuilt I got a compiler error from
the start of xmlInitMemory() because a function call to
xmlGenericError() now appears *before* the char *breakpoint declaration,
which VC++ doesn't like. I swapped them around and now it builds OK.
Would you like a patch for that, or is it obvious enough from what I've
said?
send contextual patches for analysis
Attached is a bzip2 of the output running my test program with just
encoding="utf-8". There is a sequence of Realloced() lines in which the
buffer is being doubled:
Realloced(100 to 200) Ok
Realloced(200 to 400) Ok
Realloced(400 to 800) Ok
Realloced(800 to 1600) Ok
Realloced(1600 to 3200) Ok
Realloced(3200 to 6400) Ok
Realloced(6400 to 12800) Ok
Realloced(12800 to 25600) Ok
Realloced(25600 to 51200) Ok
Realloced(51200 to 102400) Ok
Realloced(102400 to 204800) Ok
Realloced(204800 to 409600) Ok
Realloced(409600 to 819200) Ok
Realloced(819200 to 1638400) Ok
Realloced(1638400 to 3276800) Ok
Realloced(3276800 to 6553600) Ok
Realloced(6553600 to 13107200) Ok
but then a very long sequence in which this doesn't happen:
Realloced(4002 to 32208) Ok
Realloced(32208 to 48154) Ok
Realloced(48154 to 64154) Ok
Realloced(64154 to 80154) Ok
Realloced(80154 to 96154) Ok
Realloced(96154 to 112154) Ok
Realloced(112154 to 128154) Ok
Realloced(128154 to 144154) Ok
Realloced(144154 to 160154) Ok
...
So is that likely to be the cause of the problem?
yes definitely. There is something wrong w.r.t. buffer allocation
in conjunction with saving to a file descriptor. It should never need
a buffer of the size of teh output document, this should be streamed out
and works on unices...
Daniel
--
Daniel Veillard | Red Hat Desktop team http://redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]