Re: [Tracker] [Patch] Correct locales support



Le dimanche 27 aoÃt 2006 Ã 12:06 +0100, Jamie McCracken a Ãcrit :
Jamie McCracken wrote:
Laurent Aguerreche wrote:
Hello,

in my tests I saw some issues with locales support in clients. For
instance if I search something I can see something like

/home/laurent/Musique/T?t?

instead of

/home/laurent/Musique/TÃtÃ


It seems that g_print() tries to be smart about locales handling but
needs a sort of initialization before.
To test I called g_get_charset() that replied ANSI_X3.4-1968 (it is
ASCII) instead of UTF-8...


okay this is a bug. All strings need to be utf-8.

Can you let me know where this bug occurs and does it affect all the 
tracker tools including nautilus?

Nautilus isn't affected.

If I use dbus-send, it works correctly since it uses printf() to print
result. printf() just sends data to STDOUT...

All tools that print results via STDOUT are affected if what they have
to print is not in ASCII.

It is a 'problem' with g_print() which tries to be smart.


You should take a look at g_print() code in glib/gmessages.c:

  if (local_glib_print_func)
    local_glib_print_func (string);
  else
    {
      const gchar *charset;

      if (g_get_charset (&charset))
        fputs (string, stdout); /* charset is UTF-8 already */
      else
        {
          gchar *lstring = strdup_convert (string, charset);

          fputs (lstring, stdout);
          g_free (lstring);
        }
      fflush (stdout);
    }

local_glib_print_func can only be set by g_set_print_handler(). And in
glib/gutf8.c, g_get_charset() calls _g_locale_charset_raw():

  const char *
  _g_locale_charset_raw (void)
  {
    const char *codeset;

  #if !(defined WIN32 || defined OS2)

  # if HAVE_LANGINFO_CODESET

    /* Most systems support nl_langinfo (CODESET) nowadays.  */
    codeset = nl_langinfo (CODESET);

  # else

 ...

So charset is found by nl_langinfo(CODESET).

I send an attached piece to show how test works. My box is set to UTF8
but if I comment 'setlocale (LC_ALL, "");', I get ANSI_X3.4-1968 as
charset!


The right fix would be to make sure all strings are encoded as utf-8. If 
locale specific strings are used it could crash tracker!


So we should be using g_locale_to_utf8() on anything locale specific 
such as indexing filenames or their metadata and their text contents.

So on anything that go into Tracker? And to get a correct output it has
to be reverted then.

I wonder if it may be interesting to ask point of view of some gnome (at
least Glib) hackers. Perhaps you could use your blog to ask for that and
at that time to talk about how Tracker goes more and more faster!  :-)


Laurent.

Attachment: test-charset.c
Description: Text Data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]