Re: [Tracker] Tracker Internal Documentaion




Hey Philip

Thanks for the detailed explaintaion

On Thu, Apr 11, 2013 at 12:20 PM, Philip Van Hoof <philip codeminded be> wrote:
Hi Vishesh,

It's always a good idea to pose these questions on the Tracker public
mailing list, so I replied with the mailing list in CC:

So you have a denormalized schema in SQL where each multi value field in
Nepomuk is represented by a table, and each RDF class is represented by
a table with the exception of some of the xsd primitive ones (which are
implied because SQLite knows how to handle these things, for example
xsd:int, xsd:string, etc).

I'm curious as to how you handle -

1. Type Inference - Say something like this 'select ?r where { ?r a nco:Contact . }. Lets say one has some 10 nco:Contacts and some 15 nco:PersonContacts. In this case one would have to iterate over both the tables. Does tracker do that?

2. Property Inference - Do you handle cases such as 'select ?r ?l where { ?r nao:prefLabel ?l . }', where the 'nao:prefLabel' has not been explicitly defined. Lets assume that the 'nie:title' has been set.

The nie:title is a rdfs:subPropertyOf nao:prefLabel

Does tracker handle cases like this? Cause this is a rather common usecase in Nepomuk where we want to fetch a good label for the resource and we do not want to query specific properties.

3. I read that you have some support for graphs - How is that implemented? From what I understand from your db schema each property has their own column, so I'm not sure where you would store the graph related info.

Also, does tracker use graphs for any purpose?

In the Nepomuk KDE world we generally use graphs to group triples based on which application has stored the information. This is especially useful in the case of indexing. When a file has been modified and needs to be re-indexed, we need to throw away the previous data and re-index it. The file could in this case have both indexed and non-indexed data such as tags and ratings. So, we only remove the statements that were added by the indexer and then reindex the file.

This seems like a very common use case. I'm curious as to how tracker solves this problem.


Then the libtrackersparql makes a WAL SQLite connection, parses your
SPARQL and generates on the fly SQL for that. This happens often using
subqueries, and without building an AST first - making the
parse-translate phase relatively fast and resource friendly, which is a
design-choice as Tracker is indented to run on devices with few
resources.

This code does that:
https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-sparql-query.vala

That design-choice of course has a draw back in that the queries have to
manually optimized very often and/or to get things fast enough we often
had to store data in a rendundant way. We did this with domain specific
indexes which I explained in this blog item:

http://pvanhoof.be/blog/index.php/2010/07/07/domain-indexes-finished-technical-conclusions
http://pvanhoof.be/blog/index.php/2010/07/03/sqlites-wal-deleting-a-domain-specific-index
http://pvanhoof.be/blog/index.php/2010/07/01/working-on-domain-specific-indexes


This function is the code that translates a statement (from the SPARQL
UPDATE we make RDF statements and then we process those) to SQL inserts:

https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-data-update.c#n732

A lot of the SPARQL UPDATE part is here:
https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-sparql-pattern.vala

Note that we have a buffer where we eliminate duplicates like:

<a> a Class.
<a> Prop1 Value1.
<a> a Class.
<a> Prop2 Value2.
<a> Prop1 Value3.

We translate that to:

<a> a Class; Prop1 Value; Prop2 Value3.

Except when we utilize the null support:

http://pvanhoof.be/blog/index.php/2011/08/09/support-for-null-with-trackers-insert-or-replace-feature
http://pvanhoof.be/blog/index.php/2011/08/15/null-support-for-insert-or-replace-available-in-master

The second blog item and my own comments in the first illustrate the
problem with optimizing statement-sets like above just like that.

Then we also added a few SQLite functions which are used by the
SPARQL->SQL translation and for most of our own SPARQL extensions (we
ship with a bunch of SPARQL extensions that allow query writers to make
queries faster):

https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-db-interface-sqlite.c#n826

The data-manager is mostly about creating the SQL tables and
preparation. It can also handle a limited amount of ontology changes
(adding and removing of classes and properties and their extensions like
domain specific indexes):
https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-data-manager.c


Thanks for all this information

 
Kind regards,

Philip


On Thu, 2013-04-11 at 05:18 +0530, Vishesh Handa wrote:
> Hey Philip
>
> I'm one of the KDE Nepomuk developers. I've been looking into the
> tracker project for some time now, since it's good to know how other
> people internally implement things. However, it has been very hard for
> me to find any documentation on the inner working of tracker.
>
>
> I'm specifically interested in how the database schema is designed. I
> did find this "Semantic Social Desktop and Mobile Devices"
> presentation [1] which gives a very rough overview of how each class
> has its own table.
>
>
> Could you perhaps point me to some internal documentation? It would be
> most helpful. Otherwise, could I ask your some detailed questions
> about the tracker internals?
>
> I have looked at the source code, but it's a little hard to understand
> for a new comer.
>
> [1] https://live.gnome.org/Tracker/Documentation/Presentations
>
> --
> Vishesh Handa
>





--
Vishesh Handa


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]