Re: [xml] How to determine document encoding

From: BJ Chippindale <bchippindale networkadvantage biz>
To: xml gnome org
Subject: Re: [xml] How to determine document encoding
Date: Wed, 26 Jan 2005 03:29:48 +1300

Doesn't this limit the efficacy and universality of XML? You can'tcount onthe sender actually putting the encoding where it belongs, or evenincluding one

at all.

So it is not possible to have a "generic" reader that accepts any of UTF-8,

UTF-16 or any other common encodings? I actually had the temerity tocode up afragment that attempts to read in characters and usexmlDetectCharEncoding in atemporary buffer.It's crude, but for my limited purpose it worked. Appending it on thestart of

my  reader is clear-to-the-bone ugly,  but it did work.

When you say "Drop the encoding in the first line" do you mean thesender has to

do this?

respectfully
BJ

Daniel Veillard wrote:

On Mon, Jan 24, 2005 at 02:17:17PM +0100, Erik F. Andersen wrote:

I have a SOAP document that contains another SOAP document
as a node value. When I extract the embedded SOAP document
(xmlnode->children->contents) this will always be in UTF-8 because that's
how xmllib encodes contents internally.


 All strings returned from the API will be in UTF-8, yes definitely.

My problem is now how to decode the contents so that I can load it
via xmlParseDoc?


 Use xmlReadxxx APIs and provide the encoding. In general use the new
APIs based on xmlReadxxx instead of the xmlParsexxx ones.

In other words, how can I read the encoding attribute in <?xml...>
prior to actually loading the document?


 You should not do this, this is a very flawed design.

I tried loading the UTF-8 encoded document and this can lead to some
strange results because the document is actually ISO-8859-1 encoded
in the first place. Of course I can just decode the document by calling
UTF8Toisolat1 directly but this is not a very generic solution to my
problem...

Drop the encoding in the first line it will be UTF-8 in the string youread from the libxml2 API.


Daniel

Follow-Ups:
- Re: [xml] How to determine document encoding
  - From: Daniel Veillard
- Re: [xml] How to determine document encoding
  - From: Rich Salz

References:
- [xml] How to determine document encoding
  - From: Erik F. Andersen
- Re: [xml] How to determine document encoding
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]