Re: [xml] xmlDocDumpMemory() is VERY slow on Win32



[Daniel:  I have spoken to our sysadmin guy here about removing, or, 
more likely, changing that pesky email disclaimer.  He needs to talk to 
his boss about it, and hopefully something will change.  In the 
meantime, I hope you will bear with us...  The disclaimer is certainly 
not intended to mean that you can't archive/index/publish these messages 
in the usual way.]

Igor Zlatkovic wrote:

On Tue, Jul 13, 2004 at 10:00:46AM +0100, Steve Hay wrote:

 

I am using VC6, but I don't have any profiling software available.  I'll 
try sticking some printf() calls in around the functions that you 
mentioned and see if I can find the bottleneck.
     


Profiler was a part of VC 6 last time I looked at it. Check whether the bin 
directory of your VC 6 installation contains files called profile.exe,
profile.dll and profiler.ini. 

It does!  And I never knew it.  Thanks for pointing that out.


 

Well, putting a printf() call at the start of xmlCharEncOutFunc() shows 
that it gets called hundreds of times over (all evenly spread) when 
there is an encoding declaration in the XML file, but not even once when 
there isn't.  (I tried both with and without iconv support enabled in 
the libxml2 build.)

It seems that out->encoder is NULL in xmlOutputBufferWrite() [xmlIO.c] 
when there is no encoding declaration.

Is that what was expected?
   


That is expected. That is why I think that precisely this function is the 
bottleneck.

Your 10 MB CDATA section is being serialised in chunks of about 16000 bytes. 
That means that the function will be called aproximately 10M/16k times. The
calls are 'evenly spread', whatever you meay mean by it, because the
function is called in a loop.

(I meant merely that the times that I was outputting were evenly spread 
through the 15sec or so that it takes, as opposed to, say, one or two 
coming out, then a long delay, then the rest.)


Printfing the code won't give you smiles. You need a profiler. And with it,
in my role as a part-time prophet, I foretell: Your soul is lost. You will 
find the real reason for this in the Windows code, not in libxml. I really
have a bad, bad feeling about this matter.

I'll try to profile this later today.

I've just had a go at it now myself...

Here's the top of the "function timing" profile (just using the standard 
ftime.bat batch file) from my test program modified to only do the 
parse/dump once with the XML file having no encoding declaration:

        Func          Func+Child           Hit
        Time   %         Time      %      Count  Function
---------------------------------------------------------
    1046.482  37.6     1046.482  37.6 10200004 _xmlCurrentChar 
(parserinternals.obj)
     842.231  30.3     2051.063  73.7        1 _xmlParseCDSect (parser.obj)
     523.994  18.8     2783.338 100.0        1 _main (foo.obj)
     138.504   5.0      138.504   5.0       22 _xmlBufferResize (tree.obj)
      83.636   3.0       83.636   3.0       16 _xmlStrndup (xmlstring.obj)
      52.874   1.9       52.874   1.9     2602 _xmlFileRead (xmlio.obj)
      44.236   1.6      117.350   4.2      650 _xmlBufferAdd (tree.obj)
      37.194   1.3      162.210   5.8        2 
_xmlNodeDumpOutputInternal (xmlsave.obj)

(everything else was < 1%).

And here's the same output with the program modified to now only process 
the XML file having the encoding="utf-8" declaration:

        Func          Func+Child           Hit
        Time   %         Time      %      Count  Function
---------------------------------------------------------
   14108.740  86.7    14108.740  86.7      637 _xmlBufferGrow (tree.obj)
     788.004   4.8     1497.033   9.2        1 _xmlParseCDSect (parser.obj)
     547.481   3.4      547.481   3.4 10200004 _xmlCurrentChar 
(parserinternals.obj)
     539.857   3.3    16278.202 100.0        1 _main (foo.obj)

(everything else was <1%).

Would this indicate that the bottleneck is in xmlBufferGrow()?

Sounds a bit like one of Daniel's suggestions in response to my original 
bug report (http://bugzilla.gnome.org/show_bug.cgi?id=146697), namely:
"Other potential issues might be a defect in the buffer allocation 
strategy (not doubling and hence copying all the time block is 
reallocated)."

I therefore decided to turn on memory debugging (cscript configure.js 
prefix=C:\libxml2 debug=1 mem_debug=1).  This didn't produce any memory 
debugging output.  It seems that mem_debug=1 enables withMemDebug in 
configure.js, which writes WITH_MEM_DEBUG=1 into config.msvc, which 
#define's DEBUG_MEMORY_LOCATION in xmlversion.h.  However, xmlmemory.c 
contains various lines saying "#ifdef DEBUG_MEMORY" rather than "#ifdef 
DEBUG_MEMORY_LOCATION".  Is this intentional?  If so, how is one 
supposed to enable the DEBUG_MEMORY lines?

Anyway, I hacked xmlversion.h to #define DEBUG_MEMORY as well as 
DEBUG_MEMORY_LOCATION.  Then when I rebuilt I got a compiler error from 
the start of xmlInitMemory() because a function call to 
xmlGenericError() now appears *before* the char *breakpoint declaration, 
which VC++ doesn't like.  I swapped them around and now it builds OK.  
Would you like a patch for that, or is it obvious enough from what I've 
said?

Attached is a bzip2 of the output running my test program with just 
encoding="utf-8".  There is a sequence of Realloced() lines in which the 
buffer is being doubled:

Realloced(100 to 200) Ok
Realloced(200 to 400) Ok
Realloced(400 to 800) Ok
Realloced(800 to 1600) Ok
Realloced(1600 to 3200) Ok
Realloced(3200 to 6400) Ok
Realloced(6400 to 12800) Ok
Realloced(12800 to 25600) Ok
Realloced(25600 to 51200) Ok
Realloced(51200 to 102400) Ok
Realloced(102400 to 204800) Ok
Realloced(204800 to 409600) Ok
Realloced(409600 to 819200) Ok
Realloced(819200 to 1638400) Ok
Realloced(1638400 to 3276800) Ok
Realloced(3276800 to 6553600) Ok
Realloced(6553600 to 13107200) Ok

but then a very long sequence in which this doesn't happen:

Realloced(4002 to 32208) Ok
Realloced(32208 to 48154) Ok
Realloced(48154 to 64154) Ok
Realloced(64154 to 80154) Ok
Realloced(80154 to 96154) Ok
Realloced(96154 to 112154) Ok
Realloced(112154 to 128154) Ok
Realloced(128154 to 144154) Ok
Realloced(144154 to 160154) Ok
...

So is that likely to be the cause of the problem?

- Steve


------------------------------------------------
Radan Computational Ltd.

The information contained in this message and any files transmitted with it are confidential and intended for 
the addressee(s) only.  If you have received this message in error or there are any problems, please notify 
the sender immediately.  The unauthorized use, disclosure, copying or alteration of this message is strictly 
forbidden.  Note that any views or opinions presented in this email are solely those of the author and do not 
necessarily represent those of Radan Computational Ltd.  The recipient(s) of this message should check it and 
any attached files for viruses: Radan Computational will accept no liability for any damage caused by any 
virus transmitted by this email.

Attachment: memdebug.txt.bz2
Description: BZip2 compressed data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]