Hi Vishesh,
It's always a good idea to pose these questions on the Tracker public
mailing list, so I replied with the mailing list in CC:
So you have a denormalized schema in SQL where each multi value field in
Nepomuk is represented by a table, and each RDF class is represented by
a table with the exception of some of the xsd primitive ones (which are
implied because SQLite knows how to handle these things, for example
xsd:int, xsd:string, etc).
Then the libtrackersparql makes a WAL SQLite connection, parses your
SPARQL and generates on the fly SQL for that. This happens often using
subqueries, and without building an AST first - making the
parse-translate phase relatively fast and resource friendly, which is a
design-choice as Tracker is indented to run on devices with few
resources.
This code does that:
https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-sparql-query.vala
That design-choice of course has a draw back in that the queries have to
manually optimized very often and/or to get things fast enough we often
had to store data in a rendundant way. We did this with domain specific
indexes which I explained in this blog item:
http://pvanhoof.be/blog/index.php/2010/07/07/domain-indexes-finished-technical-conclusions
http://pvanhoof.be/blog/index.php/2010/07/03/sqlites-wal-deleting-a-domain-specific-index
http://pvanhoof.be/blog/index.php/2010/07/01/working-on-domain-specific-indexes
This function is the code that translates a statement (from the SPARQL
UPDATE we make RDF statements and then we process those) to SQL inserts:
https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-data-update.c#n732
A lot of the SPARQL UPDATE part is here:
https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-sparql-pattern.vala
Note that we have a buffer where we eliminate duplicates like:
<a> a Class.
<a> Prop1 Value1.
<a> a Class.
<a> Prop2 Value2.
<a> Prop1 Value3.
We translate that to:
<a> a Class; Prop1 Value; Prop2 Value3.
Except when we utilize the null support:
http://pvanhoof.be/blog/index.php/2011/08/09/support-for-null-with-trackers-insert-or-replace-feature
http://pvanhoof.be/blog/index.php/2011/08/15/null-support-for-insert-or-replace-available-in-master
The second blog item and my own comments in the first illustrate the
problem with optimizing statement-sets like above just like that.
Then we also added a few SQLite functions which are used by the
SPARQL->SQL translation and for most of our own SPARQL extensions (we
ship with a bunch of SPARQL extensions that allow query writers to make
queries faster):
https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-db-interface-sqlite.c#n826
The data-manager is mostly about creating the SQL tables and
preparation. It can also handle a limited amount of ontology changes
(adding and removing of classes and properties and their extensions like
domain specific indexes):
https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-data-manager.c
Kind regards,
Philip
On Thu, 2013-04-11 at 05:18 +0530, Vishesh Handa wrote:
> Hey Philip
>
> I'm one of the KDE Nepomuk developers. I've been looking into the
> tracker project for some time now, since it's good to know how other
> people internally implement things. However, it has been very hard for
> me to find any documentation on the inner working of tracker.
>
>
> I'm specifically interested in how the database schema is designed. I
> did find this "Semantic Social Desktop and Mobile Devices"
> presentation [1] which gives a very rough overview of how each class
> has its own table.
>
>
> Could you perhaps point me to some internal documentation? It would be
> most helpful. Otherwise, could I ask your some detailed questions
> about the tracker internals?
>
> I have looked at the source code, but it's a little hard to understand
> for a new comer.
>
> [1] https://live.gnome.org/Tracker/Documentation/Presentations
>
> --
> Vishesh Handa
>