Re: [Tracker] Collations



A follow-up of the previous email.


A) Enabled the collation function for all TEXT columns in the database
when creating/altering a table. If indexes are created on these columns,
the given collation function will be used for the index. This is done in
the 'collation' remote branch in gnome git.


I tried to measure the impact of collation when inserting the resources
in the case where we enable collation by default in all text columns. In
this case, the collation function gets called every time we insert data
in a column which has an index, as the index gets sorted based on it.

I did run several full first-time indexes, on around 24k files in my PC.
The given values are best ones over 4-5 tests. I disabled FTS in the
tests, so that the effect of the different parsers is not considered.
Anyway, I did it too late as I already had run the same tests with FTS
enabled, so as an extra I also give the indexing times with FTS enabled.

 * no collation: ~100s
 * libicu:       ~101s --> ~105s with FTS
 * libunistring: ~103s --> ~104s with FTS
 * glib:         ~105s --> ~305s with FTS

So, using libicu the effect of collation while inserting data can hardly
be noticed. libunistring and glib ones, even if slower, also perform
very well.


 * When setting collation in the column (A cases) there seems to be an
impact on the search time, even if ORDER BY not used (different values
for glib/icu/unistring in case A.1; and case A.1 compared to B1). This
is pretty strange, and don't really know why. Someone?

About this comment in the previous email; forget it. I enabled collation
also in the Uri column of the Resources table, which is wrong. The
numbers above are given with this fix applied.

Cheers,

-- 
Aleksander




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]