Daniel Veillard <veillard redhat com> writes:
On Thu, Nov 25, 2004 at 07:54:32PM +0100, Petr Pajas wrote:Hi Daniel, All, in libxml2-2.4.x, when one called xmlNodeDump, the result was a UTF-8 encoded string. This seems to have changed in libxml2-2.6.x (or probably during 2.5.x). Instead of UTF8 it now uses character entities to encode all non-ascii.probably depend if your node has a doc, if that doc has an encoding,
Yes, I'm calling that on the root element of $ cat para0.xml <?xml version='1.0' encoding='utf-8'?> <para>ÄÅÄÅÅÅÃÃÃÅÃÅÅïÂ</para> (don't know what mail would do with these UTF8 characters, but it is a valid UTF8 string).
I must say that this change is quite annoying. Why isn't the new xmlNodeDump called xmlNodeDumpASCII or something at least for the sake of backward compatibility?"It doesn't work for me, it's broken" ... Well there is a number of parameters which can change the behaviour. This didn't break from the python regression tests. So you will have to provide more informations and at least open a bug report. If something changed between 2.4.x and 2.6.x nobody complained yet in the 2 years interval about it.
Oops, the information about versions I gave last night was rather misleading. Now, that I tried again in a clean environment, it seems that the escaping started somewhere between 2.6.8 and 2.6.15 (./parseprint.c is attached): $ LD_PRELOAD=/net/su/h/local2-rh8/lib/libxml2.so.2.6.8 ./parseprint para0.xml <para>ÄÅÄÅÅÅÃÃÃÅÃÅÅïÂ</para> $ LD_PRELOAD=/net/su/h/local2-rh8/lib/libxml2.so.2.6.15 ./parseprint para0.xml <para>ěščřžřýáíůúťňď</para> Is my test case, parseprint.c correct? Is this a sign of a bug? Should I open a bug report for this?
So I have a bit of a hard time accepting your rant at this point in all honnesty.
Accept my apologies, I didn't mean to "rant" and really don't intend to make offense. My wording should have been more pragmatic (I was really tired last night).
xmlBufferPtr buffer; buffer = xmlBufferCreate(); xmlOutputBufferPtr outbuf; outbuf = (xmlOutputBufferPtr) xmlMalloc(sizeof(xmlOutputBuffer)); if (outbuf != NULL) { memset(outbuf, 0, (size_t) sizeof(xmlOutputBuffer)); outbuf->buffer = buffer; xmlNodeDumpOutput(outbuf, doc, root_element, 0, 0, "UTF-8"); xmlFree(outbuf); if ( xmlBufferLength(buffer) > 0 ) { printf("%s\n",ret); } } I wonder, is there any shortcut for that?xmlsave.c is a first step. minimal APIs to provide backward compatibility. I'm not sure this is fine. Moreover you're doing this because you don't like the default escaping. Well that default escaping *can* be provided now in the new API with xmlSaveSetEscape()
Yes, I didn't like the default escaping, since it appears that escaping wasn't default since something between 2.6.8 and 2.6.15. Now, taking it pragmatically, it's either a bug or not (which should be your decision as a designer of the API). In the first case I can fill a bug report if needed; in the latter case, I'd like to find a correct way to avoid escaping.
check struct _xmlOutputBuffer { .... xmlBufferPtr buffer; /* Local buffer encoded in UTF-8 or ISOLatin */ xmlBufferPtr conv; /* if encoder != NULL buffer for output */ there is *two* buffers.
Precisely this may not work. And if you try with a larger buffer you may only get a chunk of the UTF-8 as everything else would possibly be converted. You can hack as you want but since you want to get control over the way things are escaped, it sounds to me you should rather use the new APIs precisely to control the output.
Ok, so in the new API, if I want UTF8 encoded node-dumps (regardless of what encoding was declared in the xml file), should I use output->buffer or output->conv provided I use xmlNodeDumpOutput(outbuf, output->doc, root_element, 0, 0, "UTF-8")?
So in a nutshell, I rewrote the API for speed, and more flexibility,
I do understand that.
they may not be finalized yet, but ranting about the fact it changed really isn't the right way to get me to invest more time finishing it :-(
Again, I'm sorry. I'm not ranting. I'm just seeking advice on how to retain the old behavior with the new API, so that the Perl-bindings behaved consistently. Thanks, -- Petr
Attachment:
parseprint.c
Description: Binary data
Attachment:
pgpXTyMcWsC99.pgp
Description: PGP signature