Re: [xml] Possible bug with iconv-less UTF8 to ISO-8859-15 conversion

On Wed, Sep 08, 2004 at 10:28:59PM +0200, Peter Jacobi wrote:
Hi Mark, All,

I did this conversion routine some 13 months ago and  your criticism look 
valid, but OTOH the code did test O.K. that time. Perhaps I did some last 
minute changes breaking the code. 

I think the problem is about 32 lines down in UTF8ToISO8859x in encoding.c.
The line that reads

           if ((c & 0xC0) != 0xC0) {

should read

           if ((c & 0xC0) != 0x80) {

since the second byte of a UTF-8 sequence must be of the form 10bbbbbb. If I
make this change then my xmllint outputs the expected characters rather than
the values - that is, apart from the euro symbol, which I will look into

There are also two lines of code further down, for three-byte sequences,
which I think need changing in the same way. They are:

            if ((c1 & 0xC0) != 0xC0) {


            if ((c2 & 0xC0) != 0xC0) {

Hopefully someone else can verify that I on the right lines.

Please change also these lines and test again. It should work then.


it seems you're confirming the bug, if this is the case can you provide
a small patch so there is no doubt what need to be changed and how.
If not Mark should open a bug report so that we can double check what
is going on,

  thanks !


Daniel Veillard      | Red Hat Desktop team
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]