[xml] UTF-16, UTF-16LE and UTF-16BE handling



Hi List,

I have made a change to the handling of the UTF-16 encoding in
libxml2, and want to make sure list members are aware of both "what"
I did and "why" I did it.

Before my change, if output encoding was specified as "UTF-16", the
libxml2 library would check if iconv was installed and, if so, would
try to use iconv for converting the internal UTF-8.  If iconv was
not present, or there was no iconv routine for doing this
conversion, an internal routine which converted UTF-8 to UTF-16LE
was used.

As a result of this algorithm, the resulting output could possibly
be different on different systems (for example, a "big-endian"
system might, by default, produce a file with UTF-16BE encoding). 
Furthermore, there was at least one bug report caused because the
user's iconv was "buggy" and didn't produce the BOM which is
required for UTF-16.

A second problem present before my change was when UTF-16LE and
UTF-16BE output encoding was specified.  The internal libxml2
routines which do the conversion from UTF-8 were always producing a
BOM.  According to the specifications, this was wrong (a UTF-16 file
*must* have a BOM, but a UTF-16LE or UTF-16BE file *must not* have a
BOM).

My changes fix both of these problems.  Now by default, when UTF-16
output encoding is specified, an internal routine is used.  This
routine *always* produces a proper BOM and *always* encodes the file
using UTF-16LE.  Additionally, when UTF-16LE or UTF-16BE is
specified, *no* BOM is produced.

If any problems are encountered, please let me know.

Bill



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]