Re: GNOME 3.6 Blocker Report (T-16d)
- From: Martyn Russell <martyn lanedo com>
- To: Mikkel Kamstrup Erlandsen <mikkel kamstrup gmail com>
- Cc: Jürg Billeter <juerg billeter codethink co uk>, Frederic Peters <fpeters gnome org>, desktop-devel-list gnome org, Aleksander Morgado <aleksander lanedo com>
- Subject: Re: GNOME 3.6 Blocker Report (T-16d)
- Date: Tue, 11 Sep 2012 09:40:34 +0100
On 11/09/12 08:38, Mikkel Kamstrup Erlandsen wrote:
On 10 September 2012 17:33, Martyn Russell <martyn lanedo com> wrote:
On 08/09/12 09:20, Frederic Peters wrote:
Hello all,
Hello,
TRACKER
=======
- Empty window in LANG=ko_KR.UTF-8
https://bugzilla.gnome.org/show_bug.cgi?id=666749
Martyn Russell explained it's difficult in the previous blocker bugs
report; no progress.
Hmmm, one workaround/feature here is to do full ascii transliteration
of all strings used in full text search, no?
I briefly checked with Aleksander Morgado (who did a lot of the work
here for Tracker). He confirms my suspicions, which is that this
approach is orthogonal to the problem.
We store the data in UTF8 in the database and we do the sorting there.
So we would have to have 2 copies of all data we wanted to use this
approach with. That's not really an option. Another alternative is to do
the sorting OUT of the database, but that's really not something Tracker
would be involved in either.
Zeitgeist FTS and the Unity lenses use this powerful feature from
libicu via some helpers in libdee. You can check the libdee test cases
here to get the idea:
http://bazaar.launchpad.net/~unity-team/dee/trunk/view/head:/tests/test-icu.c#L45
The executive summary being; "øöô" transliterates to "ooo",
"Θεοδωράτου, Ελένη" to "Theodoratou, Elene", "たけだ, まさゆき" to "takeda,
masayuki". Of course queries needs to be processed this way as well,
which might be unnatural in sparql though..?
This has as an awesome side effect that I can find my Greek friend
Ελένη Θεοδωράτου by searching for "ele".
Indeed :)
Thank you for the idea.
As for performance issues with libicu that has never been a problem
for our use cases. There might be faster alternatives, but the speed
is a non-issue for the stuff that I've been doing, and the features
gained vastly outweighs the perf loss.
The options here really are:
- Re-work Jürg's initial fix in order to handle these new cases with
libunistring. Maybe providing a custom collation method which would
treat 0x10fffd always as the last char and calling libunistring's
collator internally.
- Default to libicu instead of libunistring. However, there have been
bugs reported with use of libicu which are mentioned in the bug
report above. So we could just be replacing one problem with another.
- We fix strcoll() which we believe is what libunistring is using. This
is discussed by the libunistring community:
http://lists.gnu.org/archive/html/bug-libunistring/2010-11/msg00008.html
I was hoping we could try these options in the order presented above,
but i've not had any response so far from Jürg on the matter and the
bugs are stopping me automatically switching to libicu as standard. As
you can see, Aleksander has already tried talking to the libunistring
community.
--
Regards,
Martyn
Founder and CEO of Lanedo GmbH.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]