Re: GNOME 3.6 Blocker Report (T-16d)

From: Martyn Russell <martyn lanedo com>
To: Mikkel Kamstrup Erlandsen <mikkel kamstrup gmail com>
Cc: Jürg Billeter <juerg billeter codethink co uk>, Frederic Peters <fpeters gnome org>, desktop-devel-list gnome org, Aleksander Morgado <aleksander lanedo com>
Subject: Re: GNOME 3.6 Blocker Report (T-16d)
Date: Tue, 11 Sep 2012 09:40:34 +0100

On 11/09/12 08:38, Mikkel Kamstrup Erlandsen wrote:

On 10 September 2012 17:33, Martyn Russell <martyn lanedo com> wrote:

On 08/09/12 09:20, Frederic Peters wrote:


Hello all,



Hello,

TRACKER
=======

   - Empty window in LANG=ko_KR.UTF-8
     https://bugzilla.gnome.org/show_bug.cgi?id=666749
     Martyn Russell explained it's difficult in the previous blocker bugs
     report; no progress.


Hmmm, one workaround/feature here is to do full ascii transliteration
of all strings used in full text search, no?

I briefly checked with Aleksander Morgado (who did a lot of the workhere for Tracker). He confirms my suspicions, which is that thisapproach is orthogonal to the problem.

We store the data in UTF8 in the database and we do the sorting there.So we would have to have 2 copies of all data we wanted to use thisapproach with. That's not really an option. Another alternative is to dothe sorting OUT of the database, but that's really not something Trackerwould be involved in either.

Zeitgeist FTS  and the Unity lenses use this powerful feature from
libicu via some helpers in libdee. You can check the libdee test cases
here to get the idea:
http://bazaar.launchpad.net/~unity-team/dee/trunk/view/head:/tests/test-icu.c#L45

The executive summary being; "øöô" transliterates to "ooo",
"Θεοδωράτου, Ελένη" to "Theodoratou, Elene", "たけだ, まさゆき" to "takeda,
masayuki". Of course queries needs to be processed this way as well,
which might be unnatural in sparql though..?

This has as an awesome side effect that I can find my Greek friend
Ελένη Θεοδωράτου by searching for "ele".


Indeed :)

Thank you for the idea.

As for performance issues with libicu that has never been a problem
for our use cases. There might be faster alternatives, but the speed
is a non-issue for the stuff that I've been doing, and the features
gained vastly outweighs the perf loss.


The options here really are:

- Re-work Jürg's initial fix in order to handle these new cases with
  libunistring. Maybe providing a custom collation method which would
  treat 0x10fffd always as the last char and calling libunistring's
  collator internally.

- Default to libicu instead of libunistring. However, there have been
  bugs reported with use of libicu which are mentioned in the bug
  report above. So we could just be replacing one problem with another.

- We fix strcoll() which we believe is what libunistring is using. This
  is discussed by the libunistring community:

  http://lists.gnu.org/archive/html/bug-libunistring/2010-11/msg00008.html

I was hoping we could try these options in the order presented above,but i've not had any response so far from Jürg on the matter and thebugs are stopping me automatically switching to libicu as standard. Asyou can see, Aleksander has already tried talking to the libunistringcommunity.


--
Regards,
Martyn

Founder and CEO of Lanedo GmbH.

Follow-Ups:
- Re: GNOME 3.6 Blocker Report (T-16d)
  - From: =?ISO-8859-1?Q?J=FCrg?= Billeter

References:
- GNOME 3.6 Blocker Report (T-16d)
  - From: Frederic Peters
- Re: GNOME 3.6 Blocker Report (T-16d)
  - From: Martyn Russell
- Re: GNOME 3.6 Blocker Report (T-16d)
  - From: Mikkel Kamstrup Erlandsen

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]