[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

RE : [xml] Problems to parse UTF-16 encoded xml with libxml imple mentation o f xmlReader



I first have examined the document with a binary editor. It seems to be
correctly encoded.

The tests you recommend me to do give the following results :
1) "xmllint" : OK
2) "xmllint --stream" returns "failed to parse"

I did not try the library on an other platform (Linux).

Send with this mail, you will find the following documents (the both encoded
as UTF-16) :
  - shell.txt          : a copy of the shell interface
  - testXmlReader.xml  : the document parsed

Pierre

-----Message d'origine-----
De : Daniel Veillard [mailto:veillard redhat com]
Envoyé : dimanche 22 juin 2003 22:20
À : GARNIER Pierre
Cc : 'xml gnome org'
Objet : Re: [xml] Problems to parse UTF-16 encoded xml with libxml
implementation o f xmlReader


On Fri, Jun 20, 2003 at 05:31:23PM +0100, GARNIER Pierre wrote:
>   When using xmlReader in order to parse an xml document encoded in
> UTF-16 the parser fails to read nodes.
>   It seems that the document is not recognized as UTF-16 encoded
> document.

  The instance must have a problem, have you checked xmllint against it ?

>   The document is in UTF-16 little endian

  That should work, there are test in the regression suite for UTF-16

>   I resolve my problem by converting the document from UTF-16 encoding
> to UTF-8 encoding by myself before to parse it.

  That should not be needed

>   Is this the only solution? Is this a bad solution regarding the
> performance? Is xmlReader supposed to parse only UTF-8 encoded xml?

  No, No (libxml2 will convert internally), and No.
Make 100% sure your XML file is correct, check xmllint against it. Then
check xmllint --stream against it to check the reader interface. If one
fails and not the other send a copy. Otherwise you probably have a problem
with the document.

Daniel

--
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

ÿþC:\libxml2-2.5.6.win32\util>xmllint testXmlReader.xml

 %  < ? x m l   v e r s i o n = " 1 . 0 "   e n c o d i n g = " U T F - 1 6 " ? >



 < r o o t >

     < n o d e >

         t e x t e   a c c e n t u Ú

     < / n o d e >

 < / r o o t >



C:\libxml2-2.5.6.win32\util>xmllint --stream testXmlReader.xml

testXmlReader.xml:1: error: Start tag expect, '<' not found

´W%%<?xml version="1.0" encoding="UTF-16"?>

^

testXmlReader.xml : failed to parse

Attachment: testXmlReader.xml
Description: Binary data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]