Le dimanche 27 aoÃt 2006 Ã 12:06 +0100, Jamie McCracken a Ãcrit :
Jamie McCracken wrote:Laurent Aguerreche wrote:Hello, in my tests I saw some issues with locales support in clients. For instance if I search something I can see something like /home/laurent/Musique/T?t? instead of /home/laurent/Musique/TÃtà It seems that g_print() tries to be smart about locales handling but needs a sort of initialization before. To test I called g_get_charset() that replied ANSI_X3.4-1968 (it is ASCII) instead of UTF-8...okay this is a bug. All strings need to be utf-8. Can you let me know where this bug occurs and does it affect all the tracker tools including nautilus?
Nautilus isn't affected. If I use dbus-send, it works correctly since it uses printf() to print result. printf() just sends data to STDOUT... All tools that print results via STDOUT are affected if what they have to print is not in ASCII. It is a 'problem' with g_print() which tries to be smart. You should take a look at g_print() code in glib/gmessages.c: if (local_glib_print_func) local_glib_print_func (string); else { const gchar *charset; if (g_get_charset (&charset)) fputs (string, stdout); /* charset is UTF-8 already */ else { gchar *lstring = strdup_convert (string, charset); fputs (lstring, stdout); g_free (lstring); } fflush (stdout); } local_glib_print_func can only be set by g_set_print_handler(). And in glib/gutf8.c, g_get_charset() calls _g_locale_charset_raw(): const char * _g_locale_charset_raw (void) { const char *codeset; #if !(defined WIN32 || defined OS2) # if HAVE_LANGINFO_CODESET /* Most systems support nl_langinfo (CODESET) nowadays. */ codeset = nl_langinfo (CODESET); # else ... So charset is found by nl_langinfo(CODESET). I send an attached piece to show how test works. My box is set to UTF8 but if I comment 'setlocale (LC_ALL, "");', I get ANSI_X3.4-1968 as charset!
The right fix would be to make sure all strings are encoded as utf-8. If locale specific strings are used it could crash tracker!So we should be using g_locale_to_utf8() on anything locale specific such as indexing filenames or their metadata and their text contents.
So on anything that go into Tracker? And to get a correct output it has to be reverted then. I wonder if it may be interesting to ask point of view of some gnome (at least Glib) hackers. Perhaps you could use your blog to ask for that and at that time to talk about how Tracker goes more and more faster! :-) Laurent.
Attachment:
test-charset.c
Description: Text Data