Re: [xml] French character encoding problem



On Thu, Sep 15, 2005 at 12:51:40PM -0400, Fred Fung wrote:
Daniel,

Thanks for the prompt reply.

I already tried "ISO-8859-1" (and just tried again after reading your reply) and I still get the same 
result.

  yes that's normal. You could use any encoding you will get the same.

Already read the encoding.html page a few times. According to this page,
does that mean that by specifying encoding to be ISO-8859-1, one can put
"Ã" in the xml file ?

  What is "Ã" ? What byte sequence ? Corresponding
to what unicode code point(s) ?

What about if they choose to generate Ç instead of the character ?
I actually just tried putting "Ã" in the xml file with encoding ISO-8859-1.
xmlNodeGetContent() still returns "Ãî" instead.

  It returned the 2 bytes corresponding to that code point in the UTF-8
encoding. The fact that all strings are encoded in UTF-8 internally is
written on that page. 

Also, if xmllint is able to return the proper character, what am I missing
that's causing xmlNodeGetContent() not ?

  That all internal representation are kept in UTF-8.
It is clear you did not understood that page. Make sure you understand it.

  "One of the core decisions was to force all documents to be converted to
   a default internal encoding, and that encoding to be UTF-8"

 There is a few pointers at the beginning of that page explaining more
about encodings, code points and unicode and how they relate. As long as 
you won't be familiar with those you will continue to have troubles I'm
afraid.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]