Re: [xml] Possible bug with iconv-less UTF8 to ISO-8859-15 conversion



On Wed, Sep 08, 2004 at 10:28:59PM +0200, Peter Jacobi wrote:
Hi Mark, All,

I did this conversion routine some 13 months ago and  your criticism look 
valid, but OTOH the code did test O.K. that time. Perhaps I did some last 
minute changes breaking the code. 


I think the problem is about 32 lines down in UTF8ToISO8859x in encoding.c.
The line that reads

           if ((c & 0xC0) != 0xC0) {

should read

           if ((c & 0xC0) != 0x80) {

since the second byte of a UTF-8 sequence must be of the form 10bbbbbb. If I
make this change then my xmllint outputs the expected characters rather than
the values - that is, apart from the euro symbol, which I will look into
tomorrow.

There are also two lines of code further down, for three-byte sequences,
which I think need changing in the same way. They are:

            if ((c1 & 0xC0) != 0xC0) {

and

            if ((c2 & 0xC0) != 0xC0) {

Hopefully someone else can verify that I on the right lines.

Please change also these lines and test again. It should work then.

  Peter,

it seems you're confirming the bug, if this is the case can you provide
a small patch so there is no doubt what need to be changed and how.
If not Mark should open a bug report so that we can double check what
is going on,

  thanks !

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]