Re: [xml] Possible bug with iconv-less UTF8 to ISO-8859-15 conversion



Hi Mark, All,

I did this conversion routine some 13 months ago and  your criticism look 
valid, but OTOH the code did test O.K. that time. Perhaps I did some last 
minute changes breaking the code. 


I think the problem is about 32 lines down in UTF8ToISO8859x in encoding.c.
The line that reads

           if ((c & 0xC0) != 0xC0) {

should read

           if ((c & 0xC0) != 0x80) {

since the second byte of a UTF-8 sequence must be of the form 10bbbbbb. If I
make this change then my xmllint outputs the expected characters rather than
the values - that is, apart from the euro symbol, which I will look into
tomorrow.

There are also two lines of code further down, for three-byte sequences,
which I think need changing in the same way. They are:

            if ((c1 & 0xC0) != 0xC0) {

and

            if ((c2 & 0xC0) != 0xC0) {

Hopefully someone else can verify that I on the right lines.

Please change also these lines and test again. It should work then.

Sorry for all that,
Peter




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]