Re: [xml] xmlCheckUTF8-problem (bugfix) [signed]
- From: "William M. Brack" <wbrack mmm com hk>
- To: "Julius Mittenzwei [c]" <julius muenchen-sued de>
- Cc: xml gnome org
- Subject: Re: [xml] xmlCheckUTF8-problem (bugfix) [signed]
- Date: Sat, 28 Aug 2004 09:24:23 +0800 (HKT)
Julius Mittenzwei [c] said:
Hi again,
i tried to trace the Problem a bit.
A valid 2-byte utf8 char must be something like:
110xxxxx 10xxxxxx (http://de.wikipedia.org/wiki/UTF8)
I would suggest to change this line:
if ((c & 0xc0) != 0x80 || (utf[ix + 1] & 0xc0) != 0x80)
in
xmlstring.c
to
if ((c & 0xe0) != 0xc0 || ( utf[ix + 1] & 0xc0 ) != 0x80 )
it "ands" the "c" with 11100000=0xe0 to get the first 3 bits.
If this is exactly 11000000=0xc0 you can be sure, that the byte starts
with "110".
Regards
/Julius
hmmm... I'm afraid I can't agree with that. Remember that UTF8 data is a
"string" which can be 1, 2, 3 or even 4 bytes long (rfc3629). So, for a
3-byte string the value "0xe0" is equally valid :-(.
Despite this minor disagreement, I totally agree with you that there is a
problem, and it needs to be fixed. I did a little "history checking" and
found that this particular line of code was recently changed, and the change
was because of http://bugzilla.gnome.org/show_bug.cgi?id=148115. Very
unfortunately, as you have pointed out, our fix for that bug was not totally
satisfactory :-\.
I have re-examined that area of coding, and have (hopefully) enhanced it to a
state where it should take care of all of the different cases correctly
(basically I changed the first half of the above 'if' to check equal to 0xc0).
I also added several comments along the way to show what I (think) I'm doing
:-). Could you check out the revised routine from CVS and see if it solves
your case satisfactorily? Thanks for the report, and for your help!
Bill
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]