Re: [Tracker] Tracker discussions

On Sun, 2009-07-05 at 18:48 -0400, Jamie McCracken wrote:
2) Quad store or storing more metadata about properties such as if they
are embedded or not, when last changed (for easy replication) and
possibly their origin. This metadata is dynamic in nature and so cannot
be in the ontology as we may not know if a contact is a primary store in
tracker or a secondary indexed store where contact is already defined in
EDS or ldap or online. 

This is problematic because metadata properties are stored in flattened
tables in tracker-store for speed.

One sub-optimal solution is to store everything in both quad form and in
flattened table format but thats too expensive and cumbersome IMO

My prefered solution is to store them as additional fields in the
flattened tables. Additional fields in sqlite do not generate extra
overhead if they are not used  and if they have the default NULL value
they cause no extra storage as well.

There are still a couple of issues with this approach. If we store named
graphs in the decomposed store, we cannot easily get all statements that
are part of a named graph as they are scattered across various tables.
Also, the whole concept of one row for all single-valued properties of a
class does not work anymore with multiple graphs.

It would also not help solving the backup issue - as we could not
separate the indexed tables (~/.cache) from the simple quadstore table

For a boolean field you should always use NULL and not 0 to indicate
false as a NULL causes no extra storage whereas the minimum storage size
for any non-null integer is 1 byte in sqlite (there is no single bit
size in sqlite)

NULL and 0 have different meanings. NULL means that the property is not
set, which is not the same as `false'.

The last change date for a specific metadata is only relevant if its a
non-embedded user metadata value so it can remain null for indexed
metadata and thus cause no extra overhead (indexed metadata last change
date is always the last mod date of the entity)

We might use this as a trick to keep the database smaller but we'd also
lose some features like handling of data indexed from removable volumes
via named graphs.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]