Re: [xml] how to dump a node in utf8



On Thu, Nov 25, 2004 at 07:54:32PM +0100, Petr Pajas wrote:
Hi Daniel, All,

in libxml2-2.4.x, when one called xmlNodeDump, the result was a UTF-8
encoded string. This seems to have changed in libxml2-2.6.x (or
probably during 2.5.x). Instead of UTF8 it now uses character entities
to encode all non-ascii.

  probably depend if your node has a doc, if that doc has an encoding,

I must say that this change is quite annoying. Why isn't the new
xmlNodeDump called xmlNodeDumpASCII or something at least for the sake
of backward compatibility?

  "It doesn't work for me, it's broken" ... Well there is a number
of parameters which can change the behaviour. This didn't break from
the python regression tests. So you will have to provide more informations
and at least open a bug report.
If something changed between 2.4.x and 2.6.x nobody complained yet
in the 2 years interval about it. So I have a bit of a hard time accepting
your rant at this point in all honnesty.

Anyway, the milk was spilt, I guess I have to learn to live with
that. But I'd like at least to make XML::LibXML's $node->toString()
behavior consistent, so I have to find a suitable work-around.

Since xmlNodeDump doesn't provide any parameter for setting the
requested encoding (which would always be UTF8 in our case), I
explored xmlsave.c and came up with the following code, which is
rather longish and seems rather low-level (esp. the memset). 

xmlBufferPtr buffer;

buffer = xmlBufferCreate();

xmlOutputBufferPtr outbuf;
outbuf = (xmlOutputBufferPtr) xmlMalloc(sizeof(xmlOutputBuffer));

if (outbuf != NULL) {
   memset(outbuf, 0, (size_t) sizeof(xmlOutputBuffer));
   outbuf->buffer = buffer;
   xmlNodeDumpOutput(outbuf, doc, root_element, 0, 0, "UTF-8");
   xmlFree(outbuf);
   if ( xmlBufferLength(buffer) > 0 ) {
      printf("%s\n",ret);
   }
}

I wonder, is there any shortcut for that?

  xmlsave.c is a first step. minimal APIs to provide backward compatibility.
I'm not sure this is fine.
  Moreover you're doing this because you don't like the default escaping.
Well that default escaping *can* be provided now in the new API with
xmlSaveSetEscape() 

Also, while this works, I was surprised that I got a UTF8-encoded
result even when I changed the parameter for xmlNodeDumpOutput to
"iso-8859-2" (Linux, iconv is compiled in). I won't do that in
XML::LibXML, but still... :-/

check struct _xmlOutputBuffer {
    ....
    xmlBufferPtr buffer;    /* Local buffer encoded in UTF-8 or ISOLatin */
    xmlBufferPtr conv;      /* if encoder != NULL buffer for output */

  there is *two* buffers.
  Precisely this may not work. And if you try with a larger buffer
you may only get a chunk of the UTF-8 as everything else would possibly be
converted. You can hack as you want but since you want to get control
over the way things are escaped, it sounds to me you should rather use the
new APIs precisely to control the output.
  So in a nutshell, I rewrote the API for speed, and more flexibility,
they may not be finalized yet, but ranting about the fact it changed
really isn't the right way to get me to invest more time finishing it :-(

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]