Re: [Tracker] Tracker to do list
- From: Laurent Aguerreche <laurent aguerreche free fr>
- To: Jamie McCracken <jamiemcc blueyonder co uk>
- Cc: Tracker List <tracker-list gnome org>
- Subject: Re: [Tracker] Tracker to do list
- Date: Thu, 07 Sep 2006 20:17:35 +0200
Le jeudi 07 septembre 2006 Ã 17:05 +0100, Jamie McCracken a Ãcrit :
Jamie McCracken wrote:
Laurent Aguerreche wrote:
I wonder whether the use of strlen() on UTF-8 is correct, it
shouldn't... If I remember correctly, unicode can use arrays filled that
way:
'\0' 'H' '\0' 'E' '\0' 'L' '\0' L '\0' 'O'      ("HELLO")
where a '\0' can be replaced by a value to stock characters on 2 bytes.
But I don't remember if it happens with UTF-8. I'll have to check what
happen with strlen() and funky characters.
utf-8 is not unicode.
utf-8 if ascii is always 1 byte per character and is indistinguishable 
from plain text/ascii
Non-ascii is always 2-4 bytes per character (mostly 2 bytes though).
Also non-ascii bytes cannot contain an ascii character within its 
multibyte sequence. (multibyte characters in utf-8 always have bytes 
with most significant bit of 1 whereas ascii is always less than 128 so 
has msb of 0)
for ref: http://en.wikipedia.org/wiki/UTF-8
Ok, thank you.
So I introduced a bug in tracker-utils.c during my work on UTF8. :-)
In is_text_file(), I wrote:
        if (data_read) {
                char *s;
                s = g_locale_to_utf8 (buffer, 65565, NULL, NULL, NULL);
I propose this replacement:
        if (data_read) {
                char *s;
                s = g_locale_to_utf8 (buffer, -1, NULL, NULL, NULL);
That's all... Otherwise some files in japanese are not considered as
text/plain.
Laurent.
[
Date Prev][
Date Next]   [
Thread Prev][
Thread Next]   
[
Thread Index]
[
Date Index]
[
Author Index]