[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [xml] Possible bug with Byte Order Marks



On Mon, Jun 02, 2003 at 10:29:58AM +0100, Mark Itzcovitz wrote:
> Using xmlDocFormatDump to output a document that is encoded in UTF-16 to a
> file, the BOM is initially created in the output buffer but then overwritten
> by the start of the document. I believe this is because of a bug in the
> function xmlCharEncOutFunc in encoding.c when the input buffer pointer is
> NULL. The output handler returns the number of bytes written (maybe not the
> case for all output handlers?) so the line that reads:
> 
> 	    if (ret == 0) { /* Gennady: check return value */
> 
> should read:
> 
> 	    if (ret >= 0) { /* Gennady: check return value */

  Hum, this looks right, will do.

> I notice that the iconv code just below doesn't have this check at all - I
> wonder if it should. I'm not using iconv at the moment so I can't easily
> test it.

  Hum, let's assume it's not broken :-)

> I also have a query for documents that have a BOM and where the encoding
> declaration specifies UTF-16 (not UTF-16LE or UTF-16BE). There is no problem
> reading in the document, but xmlDocDumpMemory fails, I think because it
> can't decide what encoding handler to use. xmlDocFormatDump doesn't fail but
> outputs the document in UTF-8.
> 
> Should an encoding declaration of UTF-16 work?

  yes, and default to Windows endianness since they are the main users of
UTF16, I take patches !

> I'm using version 2.5.7 on Windows (and Solaris (and OpenVMS)).

  okay, thanks !

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]