[xml] Possible bug with Byte Order Marks



Using xmlDocFormatDump to output a document that is encoded in UTF-16 to a
file, the BOM is initially created in the output buffer but then overwritten
by the start of the document. I believe this is because of a bug in the
function xmlCharEncOutFunc in encoding.c when the input buffer pointer is
NULL. The output handler returns the number of bytes written (maybe not the
case for all output handlers?) so the line that reads:

            if (ret == 0) { /* Gennady: check return value */

should read:

            if (ret >= 0) { /* Gennady: check return value */

I notice that the iconv code just below doesn't have this check at all - I
wonder if it should. I'm not using iconv at the moment so I can't easily
test it.

I also have a query for documents that have a BOM and where the encoding
declaration specifies UTF-16 (not UTF-16LE or UTF-16BE). There is no problem
reading in the document, but xmlDocDumpMemory fails, I think because it
can't decide what encoding handler to use. xmlDocFormatDump doesn't fail but
outputs the document in UTF-8.

Should an encoding declaration of UTF-16 work?

I'm using version 2.5.7 on Windows (and Solaris (and OpenVMS)).

Thanks,

Mark




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]