Re: [Tracker] [Patch] Correct locales support



Le dimanche 27 aoÃt 2006 Ã 14:29 +0100, Jamie McCracken a Ãcrit :
Laurent Aguerreche wrote:


Nautilus isn't affected.

If I use dbus-send, it works correctly since it uses printf() to print
result. printf() just sends data to STDOUT...

All tools that print results via STDOUT are affected if what they have
to print is not in ASCII.

It is a 'problem' with g_print() which tries to be smart.

okay so what is best solution?

(its not a problem with the data)

Not a problem with data but Tracker works strangely on ISO-8859-1
directories/files with not-ASCII characters in their name.

I set a terminal to ISO-8859-1 in french:
- export LANG=fr_FR
- export G_FILENAME_ENCODING=ISO-8859-1   # otherwise Glib assumes that
files are in UTF-8...

I also added calls to setlocal() to be have correct results with
g_get_charset() in trackerd and others...

Then I made this arborescence:
/home/laurent/TESTS-CHARSETS-TRACKER/
|- Haha/  HÃhÃ1/  HÃhÃ2/

In Haha, there are only files:
tests-Haha1  tests-Haha2  tests-Haha3-HÃHÃ

In HÃhÃ1, there are only files:
tests-HÃhÃ1  tests-HÃhÃ2  tests-HÃhÃ3-HÃHÃ

And same things in HÃhÃ2:
tests-HÃhÃ1-2  tests-HÃhÃ2-2  tests-HÃhÃ3-HÃHÃ-2


And if I run trackerd on directory TESTS-CHARSETS-TRACKER it only
indexes:
- TESTS-CHARSETS-TRACKER
- TESTS-CHARSETS-TRACKER/Haha
- TESTS-CHARSETS-TRACKER/Haha/tests-Haha1
- TESTS-CHARSETS-TRACKER/Haha/tests-Haha2
and doesn't see names not in ASCII.


I looked in tracker database (corrupted display because this terminal
was in UTF-8):

mysql> select * from FilePending;
+----+--------+--------+----------------+------------------------------------------------------------+----------+-------+
| ID | FileID | Action | PendingDate    | FileUri
| MimeType | IsDir |
+----+--------+--------+----------------+------------------------------------------------------------+----------+-------+
|  3 |     -1 |      1 | 20060828235343
| /home/laurent/TESTS-CHARSETS-TRACKER/Haha/tests-Haha1      | unknown
|     0 |
|  4 |     -1 |      1 | 20060828235416
| /home/laurent/TESTS-CHARSETS-TRACKER/Haha/tests-Haha3-HïHï | unknown
|     0 |
|  5 |     -1 |      1 | 20060828235421
| /home/laurent/TESTS-CHARSETS-TRACKER/Haha/tests-Haha2      | unknown
|     0 |
|  6 |     -1 |      1 | 20060828235423
| /home/laurent/TESTS-CHARSETS-TRACKER/Haha                  | unknown
|     0 |
|  7 |     -1 |      1 | 20060828235431
| /home/laurent/TESTS-CHARSETS-TRACKER/Hïhï1                 | unknown
|     0 |
|  8 |     -1 |      1 | 20060829001014
| /home/laurent/TESTS-CHARSETS-TRACKER/Hïhï2                 | unknown
|     0 |
+----+--------+--------+----------------+------------------------------------------------------------+----------+-------+

So files are correctly found and are recorded in ISO-8859-1 but some of
them won't never be treated!

They correctly appear with 'g_async_queue_try_pop
(tracker->file_process_queue)' during process but are not indexed since:
- info->mtime == info->indextime == 0...
- action is TRACKER_ACTION_FILE_CHECK
so index_file() is not called.




There is also a problem with tracker-search (and I suppose others
tools): it doesn't support non-ASCII characters in ISO-8859-1.

I set environment as before then run tracker-search in gdb. If I search
for 'ha' it terminates fine.
If I search "hÃ" it exits with code 01 and with dbus-monitor I see that
my query is not sent to trackerd!

Is DBus known to have bugs with non-UTF8 environment?

Should we use g_locale_from_utf8 before calling printf?

(we mustn't send non-locale stuff to an ordinary xterm as it could not 
handle it)


The right fix would be to make sure all strings are encoded as utf-8. If 
locale specific strings are used it could crash tracker!

So we should be using g_locale_to_utf8() on anything locale specific 
such as indexing filenames or their metadata and their text contents.

So on anything that go into Tracker? And to get a correct output it has
to be reverted then.

I have standardised everything to be utf-8 so we do not convert back to 
locale by default. I will leave it to clients to convert as necessary.



I wonder if it may be interesting to ask point of view of some gnome (at
least Glib) hackers. Perhaps you could use your blog to ask for that and
at that time to talk about how Tracker goes more and more faster!  :-)


will do once next version is complete.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]