Re: [xml] How to determine document encoding

Doesn't this limit the efficacy and universality of XML?   You can't count on the sender actually putting the encoding where it belongs, or even including one at all.

So it is not possible to have a "generic" reader that accepts any of UTF-8, UTF-16 or any other common encodings?   I actually had the temerity to code up a fragment that attempts to read in characters  and use xmlDetectCharEncoding in a temporary buffer.   

It's crude, but for my limited purpose it worked.  Appending it on the start of my  reader is clear-to-the-bone ugly,  but it did work.

When you say "Drop the encoding in the first line" do you mean the sender has to do this?


Daniel Veillard wrote:
In other words, how can I read the encoding attribute in <?xml...>
prior to actually loading the document?

  You should not do this, this is a very flawed design.

I tried loading the UTF-8 encoded document and this can lead to some
strange results because the document is actually ISO-8859-1 encoded
in the first place. Of course I can just decode the document by calling
UTF8Toisolat1 directly but this is not a very generic solution to my

  Drop the encoding in the first line it will be UTF-8 in the string you 
read from the libxml2 API.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]