Re: [xml] xmlDocDumpMemory() is VERY slow on Win32

On Mon, Jul 12, 2004 at 02:35:46PM +0100, Steve Hay wrote:
I've just discovered something else very strange going on, which enables 
me to present a small self-contained test case.

Previously, I could only reproduce the slowness with a particular XML 
file that I was working with.  Whenever I tried to have the test program 
create an XML file itself for the purposes of testing, the speed 
difference did not show up.

The XML file that I was working with declared the encoding to be 
"iso-8859-1", but I was omitting the encoding declaration in the test 
XML files that I was creating.  It turns out that this also affects things!

The following program:

#include <stdio.h>
#include <time.h>
#include <libxml/parser.h>
#include <libxml/tree.h>
int main(void) {
  char *file = "foo.xml";
  char *line = "..................................................\n";
  char *encodings[] = { "encoding=\"iso-8859-1\"", "encoding=\"utf-8\"", 
"" };
  FILE *f;
  time_t start, end;
  xmlDocPtr doc;
  xmlChar *result;
  int i, j, len = 0;
  for (i = 0; i < 3; i++) {
    if ((f = fopen(file, "w")) == NULL) {
      printf("can't write test file!\n");
      return 3;
    fprintf(f, "<?xml version=\"1.0\" %s?>\n", encodings[i]);
    fprintf(f, "<test><![CDATA[\n");
    for (j = 0; j < 200000; j++) {
      fprintf(f, line);
    fprintf(f, "]]></test>\n");
    if ((doc = xmlReadFile(file, NULL, 0)) == NULL) {
      printf("can't read test file!\n");
      return 4;
    xmlDocDumpMemory(doc, &result, &len);
    if (result == NULL) {
      printf("can't dump test file!\n");
      return 5;
    printf("%s: %d seconds\n", encodings[i], end - start);
  return 0;

generates these bizarre results for me on Win32:

encoding="iso-8859-1": 15 seconds
encoding="utf-8": 15 seconds
: 0 seconds

Why is xmlDocDumpMemory() so much slower when an encoding is declared?

This is using libxml2-2.6.11 built with MSVC++ 6.0 the same way as 
described in my original posting.

What results does the program produce on Linux?

On this low-end machine, 2xPII @ 350MHz, all three output lines count 1 

Does this help anyone help me with this problem?

I don't know. I'm stuck on Linux at the moment, can't boot Windows until
tomorrow. Glancing at the libxml code, one thing I can say for sure. The
bottleneck is either in xmlCharEncOutFunc in encoding.c, or farther down 
the road in the UTF8toisolat and UTF8toUTF8.

If you use VC6, you can compile a profile-enabled libxml and see where it
spends most of its time. Somehow I have a bad feeling that this is a problem
in Windows memory manager and won't be fixed easily.


