Re: [xml] endianness problem in regression suite
- From: "William M. Brack" <wbrack mmm com hk>
- To: xml gnome org
- Cc: "Nicolai Langfeldt" <janl linpro no>
- Subject: Re: [xml] endianness problem in regression suite
- Date: Thu, 20 Nov 2003 19:15:50 +0800 (HKT)
Daniel Veillard said:
On Thu, Nov 20, 2003 at 10:17:19AM +0100, Nicolai Langfeldt wrote:
Building libxml2 on Solaris/Sparc. In the tests I see a problem
in the
XML regression tests. From Makefile.am:
[...]
Testing utf16bom.xml
Binary files ./result/utf16bom.xml and result.utf16bom.xml differ
I've run xmllint by hand on result/utf16bom.xml. In emacs
hexl-mode I
see this in the original file
00000000: fffe 3c00 3f00 7800 6d00 6c00 2000 7600 ..<.?.x.m.l.
.v.
00000010: 6500 7200 7300 6900 6f00 6e00 3d00 2200
e.r.s.i.o.n.=.".
And this is the xmllint output:
00000000: feff 003c 003f 0078 006d 006c 0020 0076 ...<.?.x.m.l.
.v
00000010: 0065 0072 0073 0069 006f 006e 003d 0022
.e.r.s.i.o.n.=."
I'm told that feff == byte-order mark and that fffe == undefined
character.
Well the file is flagged as UTF-16, which unfortunately has 2
variant
onle little-endian, and the other big-endian. Seems the libxml2
UTF-16
serialization code uses the platform native endianness instead of
always
using little-endian. The files are still well formed, but maybe this
need
to be fixed, xmlInitCharEncodingHandlers() already tests the
endianness
of the architecture and it's used for the UTF-8 to UTF-16 input
conversion.
This could also be the behaviour of your iconv() library taking over
libxml2 default UTF-16 routines, that would need to be checked under
a
debugger.
Daniel
There's actually a bug already opened concerning this problem - see
my comments at the end of
http://bugzilla.gnome.org/show_bug.cgi?id=122619
Bill
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]