Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)
- From: Aleksander Morgado <aleksander lanedo com>
- To: jamie mccrack gmail com
- Cc: "Tracker \(devel\)" <tracker-list gnome org>
- Subject: Re: [Tracker] libicu & libunistring based parsers (was:Re: libunistring-based parser in libtracker-fts)
- Date: Wed, 05 May 2010 12:53:57 +0200
Hi Jamie & all,
I will modify the libunistring and libicu based algorithms tomorrow so
that if ASCII-7 only, normalization and casefolding is not done, just a
tolower() of each character. That would make the values more approximate
to the glib/custom parser.
Just finished the ASCII-only improvement in both libunistring and
libicu, and here are the new results. This time instead of the mean
value of several tests, I took the minimum one.
For the 50k ASCII-only file:
* glib/pango: 0.062
* libicu: 0.060
* libunistring: 0.057
For the 200k ASCII-only file:
* glib/pango: 0.189
* libicu: 0.200
* libunistring: 0.119
And for the 182k mixed english/chinese/japanese file:
* glib/pango: 21.4
* libicu: 0.220
* libunistring: 0.175
So, with this improvement considering ASCII-only words a special case,
libunistring really beats them all.
libicu and glib/pango remain pretty similar, and while libicu seems
faster for the smallest file, glib/pango seems faster in the biggest
one.
As a reference, added also the test with the mixed
english/chinese/japanese, which also change with the new ASCII-only
parsing improvement. Now libunistring seems 20% faster than libicu (was
around 10% yesterday).
Cheers!
--
Aleksander
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]