RE : RE : [xml] Problems to parse UTF-16 encoded xml with libxml imple mentation o f xmlReader



On Tue, Aug 19, 2003 at 04:08:39PM +0100, GARNIER Pierre wrote:
Sorry,
This now (with 2.5.10) functional!
Only the little-endian and big-endian with no byte order mark 
(3C003F00 and
003C003F) are not recognized ... but no byte order mark should be
considered
as an error.

  Hum, no that need to be fixed, no BOM UTF16 for XML is fine (see the
spec). Send me a sample and a way to reproduce it, this need fixing,

  thanks !

I send in this samples UTF16 encoded xml files that are combination of
following criteria : big/little endian, with/without BOM, with/without
encoding attribute in xml declaration.

By parsing with the testReader program in libxml2-2.5.10, two errors are
thrown :
 1- "UTF16le-noBOM-noENC.xml:1: error: xmlParseStartTag: invalid element
name"
 2- "UTF16be-noBOM.xml:1: error: Document is empty"
A correlation between the with/without BOM criterion and errors is to be
noted.
The first error is encountered on little endian without BOM encoded files,
and the second on big endian without BOM encoded files.

Pierre.

Attachment: xmlUTF16.zip
Description: Binary data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]