Re: [xml] xmlDocDumpMemory() is VERY slow on Win32



Daniel Veillard wrote:

On Tue, Jul 13, 2004 at 09:58:33AM -0400, Daniel Veillard wrote:
 

Attached is a bzip2 of the output running my test program with just 
encoding="utf-8".  There is a sequence of Realloced() lines in which the 
buffer is being doubled:

Realloced(100 to 200) Ok
Realloced(200 to 400) Ok
Realloced(400 to 800) Ok
Realloced(800 to 1600) Ok
Realloced(1600 to 3200) Ok
Realloced(3200 to 6400) Ok
Realloced(6400 to 12800) Ok
Realloced(12800 to 25600) Ok
Realloced(25600 to 51200) Ok
Realloced(51200 to 102400) Ok
Realloced(102400 to 204800) Ok
Realloced(204800 to 409600) Ok
Realloced(409600 to 819200) Ok
Realloced(819200 to 1638400) Ok
Realloced(1638400 to 3276800) Ok
Realloced(3276800 to 6553600) Ok
Realloced(6553600 to 13107200) Ok

but then a very long sequence in which this doesn't happen:

Realloced(4002 to 32208) Ok
Realloced(32208 to 48154) Ok
Realloced(48154 to 64154) Ok
Realloced(64154 to 80154) Ok
Realloced(80154 to 96154) Ok
Realloced(96154 to 112154) Ok
Realloced(112154 to 128154) Ok
Realloced(128154 to 144154) Ok
Realloced(144154 to 160154) Ok
...

So is that likely to be the cause of the problem?
     

 yes definitely. There is something wrong w.r.t. buffer allocation
in conjunction with saving to a file descriptor. It should never need
a buffer of the size of teh output document, this should be streamed out
and works on unices...
   


 Hum, scratch that. Since you're saving to memory a 10MB document,
then this allocation/reallocation is normal.

OK.  So the first sequence in which the buffer is being doubled is the 
xmlReadFile() call, reading in the 10MB file.  The second sequence in 
which the buffer is grown by 16000 bytes at a time is what Igor referred 
to -- and it is indeed approx 640 calls (10MB / 16000bytes) as he predicted.

I now share Igor's bad feeling about this in the light of the following 
program:

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <time.h>
void main(void) {
  char *buf;
  int i, size;
  time_t start, end;
  size = 16000;
  if ((buf = malloc(size)) == NULL) {
    printf("malloc failed\n");
    exit(3);
  }
  else {
    printf("malloc OK (%d)\n", size);
  }
  time(&start);
  for (i = 0; i < 640; i++, size += 16000) {
    if ((buf = (char *)realloc(buf, size + 16000)) == NULL) {
      printf("%03d: realloc failed\n", i);
      exit(4);
    }
    else {
      printf("%03d: realloc OK (%d to %d)\n", i, size, size + 16000);
    }
  }
  time(&end);
  free(buf);
  printf("%d seconds\n", end - start);
  exit(0);
}

Could somebody please try this on Linux (well, anything other than Win32 
I guess) and tell me how long it takes to run?

On my P4 2GHz Win32 box it takes 15seconds -- exactly the same time as 
the XML parsing/dumping earlier!  (And it's just as slow without the 
printf() calls in it.)

I dare say the time will be 0 or 1 second on Linux.

My soul is indeed lost :(  Where do I go from here?  (Don't say Linux...)

- Steve



------------------------------------------------
Radan Computational Ltd.

The information contained in this message and any files transmitted with it are confidential and intended for 
the addressee(s) only.  If you have received this message in error or there are any problems, please notify 
the sender immediately.  The unauthorized use, disclosure, copying or alteration of this message is strictly 
forbidden.  Note that any views or opinions presented in this email are solely those of the author and do not 
necessarily represent those of Radan Computational Ltd.  The recipient(s) of this message should check it and 
any attached files for viruses: Radan Computational will accept no liability for any damage caused by any 
virus transmitted by this email.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]