Re: [xml] Possible bug with iconv-less UTF8 to ISO-8859-15 conversion
- From: Daniel Veillard <veillard redhat com>
- To: Peter Jacobi <pj walter-graphtek com>
- Cc: xml gnome org, Mark Itzcovitz <mark itzcovitz ntlworld com>
- Subject: Re: [xml] Possible bug with iconv-less UTF8 to ISO-8859-15 conversion
- Date: Wed, 8 Sep 2004 17:31:07 -0400
On Wed, Sep 08, 2004 at 10:28:59PM +0200, Peter Jacobi wrote:
Hi Mark, All,
I did this conversion routine some 13 months ago and your criticism look
valid, but OTOH the code did test O.K. that time. Perhaps I did some last
minute changes breaking the code.
I think the problem is about 32 lines down in UTF8ToISO8859x in encoding.c.
The line that reads
if ((c & 0xC0) != 0xC0) {
should read
if ((c & 0xC0) != 0x80) {
since the second byte of a UTF-8 sequence must be of the form 10bbbbbb. If I
make this change then my xmllint outputs the expected characters rather than
the values - that is, apart from the euro symbol, which I will look into
tomorrow.
There are also two lines of code further down, for three-byte sequences,
which I think need changing in the same way. They are:
if ((c1 & 0xC0) != 0xC0) {
and
if ((c2 & 0xC0) != 0xC0) {
Hopefully someone else can verify that I on the right lines.
Please change also these lines and test again. It should work then.
Peter,
it seems you're confirming the bug, if this is the case can you provide
a small patch so there is no doubt what need to be changed and how.
If not Mark should open a bug report so that we can double check what
is going on,
thanks !
Daniel
--
Daniel Veillard | Red Hat Desktop team http://redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]