[xml] Problems to parse UTF-16 encoded xml with libxml implementation o f xmlReader
- From: GARNIER Pierre <pgarnier mega com>
- To: "'xml gnome org'" <xml gnome org>
- Subject: [xml] Problems to parse UTF-16 encoded xml with libxml implementation o f xmlReader
- Date: Fri, 20 Jun 2003 17:31:23 +0100
Hello,
Following is the description of the problem I encountered.
*Context :
I work under WindowsXP with a 2.5.6 version of libxml found in the
"libxml2-2.5.6.win32.zip" archive provided by Igor Zlatkovic.
*Problem :
When using xmlReader in order to parse an xml document encoded in UTF-16
the parser fails to read nodes.
It seems that the document is not recognized as UTF-16 encoded document.
The document is in UTF-16 little endian
I first used the xmlNewTextReaderFilename function to create the parser.
The errors messages are the following :
- if the file begin with the \xFF \xFE bytes : "Start tag expect, '<'
not found"
- if the file begin with the \x3C \x00 bytes : "xmlParseStartTag:
invalid element name"
I secondly used xmlAllocParserInputBuffer(XML_CHAR_ENCODING_UTF16LE) to
get the xmlParserInputBufferPtr that I passed to the xmlNewTextReader
function.
Then the error message is the follwing : "Extra content at the end of the
document"
*Resolution?
I resolve my problem by converting the document from UTF-16 encoding to
UTF-8 encoding by myself before to parse it.
Is this the only solution? Is this a bad solution regarding the
performance? Is xmlReader supposed to parse only UTF-8 encoded xml?
Thank you,
Pierre.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]