Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)

From: Jamie McCracken <jamie mccrack googlemail com>
To: Aleksander Morgado <aleksander lanedo com>
Cc: "Tracker \(devel\)" <tracker-list gnome org>
Subject: Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)
Date: Wed, 05 May 2010 11:12:11 -0400

On Wed, 2010-05-05 at 12:53 +0200, Aleksander Morgado wrote:

Hi Jamie & all,


I will modify the libunistring and libicu based algorithms tomorrow so
that if ASCII-7 only, normalization and casefolding is not done, just a
tolower() of each character. That would make the values more approximate
to the glib/custom parser.


Just finished the ASCII-only improvement in both libunistring and
libicu, and here are the new results. This time instead of the mean
value of several tests, I took the minimum one.

For the 50k ASCII-only file:
 * glib/pango:   0.062
 * libicu:       0.060
 * libunistring: 0.057

For the 200k ASCII-only file:
 * glib/pango:   0.189
 * libicu:       0.200
 * libunistring: 0.119

And for the 182k mixed english/chinese/japanese file:
* glib/pango:   21.4
* libicu:        0.220
* libunistring:  0.175

So, with this improvement considering ASCII-only words a special case,
libunistring really beats them all.

 
yeah libunistring looks like good stuff - I must check the source!

I still note you need to apply word filtering rules on words beginning
with numbers or symbols - Im sure thats easy to do?

thanks

jamie

Follow-Ups:
- Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)
  - From: Aleksander Morgado

References:
- [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)
  - From: Aleksander Morgado
- Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)
  - From: Jamie McCracken
- Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)
  - From: Jamie McCracken
- Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)
  - From: Aleksander Morgado
- Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)
  - From: Jamie McCracken
- Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)
  - From: Aleksander Morgado
- Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)
  - From: Aleksander Morgado

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]