Re: [Tracker] Collation implementation



The summary of what has been decided in IRC,

The basic list of requirements related to collation would be:
 a) Order of results of queries based on collation, instead of binary
comparison of strings.
 b) Collation is based on locale of the user, so changes of locale must
be supported.
 c) Tracker should support returning collation keys of the result set of
a query, like enabling a new tracker:collate() method in SPARQL.

We will currently dump requirement c). Internally, tracker will always
use the default Unicode collation function, and thus, any application
that wants for example to merge results of two different queries and
sort them can use the default Unicode collation function as provided by
libicu, libunistring or glib. So, right now no collation function or
collation-key generation function will be included in tracker's API.
Note that things may change if we ever decide to support additional
collation methods like the title-collation.


****************
2) Set the collation method by default in the text columns when creating
the db tables.
We actually already agreed on doing this, before realizing that we also
need to be able to return collation keys (requirement C, above).

Pros:
+++ Fast in columns with indexes, as the collation would be used to
create the indexes properly, and during the query the order is already
implicit.
+++ Integration with sqlite is direct, just enable the collation
function when creating the column, and it will use it directly wherever
needed. Easier to maintain, probably.
+++ Almost no performance degradation when inserting new elements (see
previous emails on this topic).

Cons:
--- If collation keys requested, they must be computed for each query,
as they are nowhere stored.
--- If a query is done using a column without index for ordering,
collation function will be used to order the results, so it will be as
slow as method 1).
--- On locale change, indexes must be re-created, which may be slow if
lots of elements in the tables. Although this is not a big deal at the
end.


So, we'll go for option #2, enabling the Unicode collation function by
default in all text columns. 

Cheers,

-- 
Aleksander




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]