[xml] UTF-16, UTF-16LE and UTF-16BE handling
- From: "William M. Brack" <wbrack mmm com hk>
- To: xml gnome org
- Subject: [xml] UTF-16, UTF-16LE and UTF-16BE handling
- Date: Fri, 28 Nov 2003 18:07:25 +0800 (HKT)
Hi List,
I have made a change to the handling of the UTF-16 encoding in
libxml2, and want to make sure list members are aware of both "what"
I did and "why" I did it.
Before my change, if output encoding was specified as "UTF-16", the
libxml2 library would check if iconv was installed and, if so, would
try to use iconv for converting the internal UTF-8. If iconv was
not present, or there was no iconv routine for doing this
conversion, an internal routine which converted UTF-8 to UTF-16LE
was used.
As a result of this algorithm, the resulting output could possibly
be different on different systems (for example, a "big-endian"
system might, by default, produce a file with UTF-16BE encoding).
Furthermore, there was at least one bug report caused because the
user's iconv was "buggy" and didn't produce the BOM which is
required for UTF-16.
A second problem present before my change was when UTF-16LE and
UTF-16BE output encoding was specified. The internal libxml2
routines which do the conversion from UTF-8 were always producing a
BOM. According to the specifications, this was wrong (a UTF-16 file
*must* have a BOM, but a UTF-16LE or UTF-16BE file *must not* have a
BOM).
My changes fix both of these problems. Now by default, when UTF-16
output encoding is specified, an internal routine is used. This
routine *always* produces a proper BOM and *always* encodes the file
using UTF-16LE. Additionally, when UTF-16LE or UTF-16BE is
specified, *no* BOM is produced.
If any problems are encountered, please let me know.
Bill
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]