Re: [xml] How to determine document encoding
- From: Daniel Veillard <veillard redhat com>
- To: "Erik F. Andersen" <ea ascott dk>
- Cc: xml gnome org
- Subject: Re: [xml] How to determine document encoding
- Date: Mon, 24 Jan 2005 08:54:47 -0500
On Mon, Jan 24, 2005 at 02:17:17PM +0100, Erik F. Andersen wrote:
I have a SOAP document that contains another SOAP document
as a node value. When I extract the embedded SOAP document
(xmlnode->children->contents) this will always be in UTF-8 because that's
how xmllib encodes contents internally.
All strings returned from the API will be in UTF-8, yes definitely.
My problem is now how to decode the contents so that I can load it
Use xmlReadxxx APIs and provide the encoding. In general use the new
APIs based on xmlReadxxx instead of the xmlParsexxx ones.
In other words, how can I read the encoding attribute in <?xml...>
prior to actually loading the document?
You should not do this, this is a very flawed design.
I tried loading the UTF-8 encoded document and this can lead to some
strange results because the document is actually ISO-8859-1 encoded
in the first place. Of course I can just decode the document by calling
UTF8Toisolat1 directly but this is not a very generic solution to my
Drop the encoding in the first line it will be UTF-8 in the string you
read from the libxml2 API.
Daniel Veillard | Red Hat Desktop team http://redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
] [Thread Prev