[xml] how to dump a node in utf8



Hi Daniel, All,

in libxml2-2.4.x, when one called xmlNodeDump, the result was a UTF-8
encoded string. This seems to have changed in libxml2-2.6.x (or
probably during 2.5.x). Instead of UTF8 it now uses character entities
to encode all non-ascii.

I must say that this change is quite annoying. Why isn't the new
xmlNodeDump called xmlNodeDumpASCII or something at least for the sake
of backward compatibility?

Anyway, the milk was spilt, I guess I have to learn to live with
that. But I'd like at least to make XML::LibXML's $node->toString()
behavior consistent, so I have to find a suitable work-around.

Since xmlNodeDump doesn't provide any parameter for setting the
requested encoding (which would always be UTF8 in our case), I
explored xmlsave.c and came up with the following code, which is
rather longish and seems rather low-level (esp. the memset). 

xmlBufferPtr buffer;

buffer = xmlBufferCreate();

xmlOutputBufferPtr outbuf;
outbuf = (xmlOutputBufferPtr) xmlMalloc(sizeof(xmlOutputBuffer));

if (outbuf != NULL) {
   memset(outbuf, 0, (size_t) sizeof(xmlOutputBuffer));
   outbuf->buffer = buffer;
   xmlNodeDumpOutput(outbuf, doc, root_element, 0, 0, "UTF-8");
   xmlFree(outbuf);
   if ( xmlBufferLength(buffer) > 0 ) {
      printf("%s\n",ret);
   }
}

I wonder, is there any shortcut for that?

Also, while this works, I was surprised that I got a UTF8-encoded
result even when I changed the parameter for xmlNodeDumpOutput to
"iso-8859-2" (Linux, iconv is compiled in). I won't do that in
XML::LibXML, but still... :-/

Thanks,
-- Petr

Attachment: pgp4QucUKIU9B.pgp
Description: PGP signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]