Re: [Tracker] Tracker Internal Documentaion

From: Philip Van Hoof <philip codeminded be>
To: Vishesh Handa <me vhanda in>
Cc: Tracker mailing list <tracker-list gnome org>
Subject: Re: [Tracker] Tracker Internal Documentaion
Date: Thu, 11 Apr 2013 14:29:21 +0200

On Thu, 2013-04-11 at 15:53 +0530, Vishesh Handa wrote:


I'm curious as to how you handle -


1. Type Inference - Say something like this 'select ?r where { ?r a
nco:Contact . }. Lets say one has some 10 nco:Contacts and some 15
nco:PersonContacts. In this case one would have to iterate over both
the tables. Does tracker do that?


This gets translated to something like

SELECT Uri FROM "nco:Contact", to translate from nco:Contact to the SQL
table "nco:Contact" is something Tracker does internally. I think there
is a environment variable that you can turn on to print the SQLite
statements the first time they are prepared (there is a LRU cache of
SQLite statements). These statements include the query.

Type interference itself (select ?p ?o { nco:Contact ?p ?o }) works
because Tracker stores its own ontology in itself, and an ontology are
just a bunch of rdf statements.

2. Property Inference - Do you handle cases such as 'select ?r ?l
where { ?r nao:prefLabel ?l . }', where the 'nao:prefLabel' has not
been explicitly defined. Lets assume that the 'nie:title' has been
set.

The nie:title is a rdfs:subPropertyOf nao:prefLabel


I don't remember.

Does tracker handle cases like this? Cause this is a rather common
usecase in Nepomuk where we want to fetch a good label for the
resource and we do not want to query specific properties.


Afaik yes.

3. I read that you have some support for graphs - How is that
implemented? From what I understand from your db schema each property
has their own column, so I'm not sure where you would store the graph
related info.


The graph support is limited. More explanation on its limitations here:
https://live.gnome.org/Tracker/Documentation/SparqlFeatures#Named_Graphs

Also, does tracker use graphs for any purpose?


Only for storing the origin of a statement (which was the only required
use-case for the N9, which we were mainly targeting while developing
Tracker's SPARQL endpoint and Nepomuk ontology support).

In the Nepomuk KDE world we generally use graphs to group triples
based on which application has stored the information. This is
especially useful in the case of indexing. When a file has been
modified and needs to be re-indexed, we need to throw away the
previous data and re-index it. The file could in this case have both
indexed and non-indexed data such as tags and ratings. So, we only
remove the statements that were added by the indexer and then reindex
the file.


This isn't supported by Tracker.

This seems like a very common use case. I'm curious as to how tracker
solves this problem.


It doesn't. Full graph support was not a design consideration, only
limited support for it was.


Kind regards,

Philip

        Then the libtrackersparql makes a WAL SQLite connection,
        parses your
        SPARQL and generates on the fly SQL for that. This happens
        often using
        subqueries, and without building an AST first - making the
        parse-translate phase relatively fast and resource friendly,
        which is a
        design-choice as Tracker is indented to run on devices with
        few
        resources.
        
        This code does that:
        https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-sparql-query.vala
        
        That design-choice of course has a draw back in that the
        queries have to
        manually optimized very often and/or to get things fast enough
        we often
        had to store data in a rendundant way. We did this with domain
        specific
        indexes which I explained in this blog item:
        
        http://pvanhoof.be/blog/index.php/2010/07/07/domain-indexes-finished-technical-conclusions
        http://pvanhoof.be/blog/index.php/2010/07/03/sqlites-wal-deleting-a-domain-specific-index
        http://pvanhoof.be/blog/index.php/2010/07/01/working-on-domain-specific-indexes
        
        
        This function is the code that translates a statement (from
        the SPARQL
        UPDATE we make RDF statements and then we process those) to
        SQL inserts:
        
        https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-data-update.c#n732
        
        A lot of the SPARQL UPDATE part is here:
        https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-sparql-pattern.vala
        
        Note that we have a buffer where we eliminate duplicates like:
        
        <a> a Class.
        <a> Prop1 Value1.
        <a> a Class.
        <a> Prop2 Value2.
        <a> Prop1 Value3.
        
        We translate that to:
        
        <a> a Class; Prop1 Value; Prop2 Value3.
        
        Except when we utilize the null support:
        
        
http://pvanhoof.be/blog/index.php/2011/08/09/support-for-null-with-trackers-insert-or-replace-feature
        http://pvanhoof.be/blog/index.php/2011/08/15/null-support-for-insert-or-replace-available-in-master
        
        The second blog item and my own comments in the first
        illustrate the
        problem with optimizing statement-sets like above just like
        that.
        
        Then we also added a few SQLite functions which are used by
        the
        SPARQL->SQL translation and for most of our own SPARQL
        extensions (we
        ship with a bunch of SPARQL extensions that allow query
        writers to make
        queries faster):
        
        https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-db-interface-sqlite.c#n826
        
        The data-manager is mostly about creating the SQL tables and
        preparation. It can also handle a limited amount of ontology
        changes
        (adding and removing of classes and properties and their
        extensions like
        domain specific indexes):
        https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-data-manager.c
        


Thanks for all this information

 

        Kind regards,
        
        Philip
        
        
        On Thu, 2013-04-11 at 05:18 +0530, Vishesh Handa wrote:
        > Hey Philip
        >
        > I'm one of the KDE Nepomuk developers. I've been looking
        into the
        > tracker project for some time now, since it's good to know
        how other
        > people internally implement things. However, it has been
        very hard for
        > me to find any documentation on the inner working of
        tracker.
        >
        >
        > I'm specifically interested in how the database schema is
        designed. I
        > did find this "Semantic Social Desktop and Mobile Devices"
        > presentation [1] which gives a very rough overview of how
        each class
        > has its own table.
        >
        >
        > Could you perhaps point me to some internal documentation?
        It would be
        > most helpful. Otherwise, could I ask your some detailed
        questions
        > about the tracker internals?
        >
        > I have looked at the source code, but it's a little hard to
        understand
        > for a new comer.
        >
        > [1]
        https://live.gnome.org/Tracker/Documentation/Presentations
        >
        > --
        > Vishesh Handa
        >
        
        
        



-- 
Vishesh Handa


-- 
Philip Van Hoof
Software developer
Codeminded BVBA - http://codeminded.be

Follow-Ups:
- Re: [Tracker] Tracker Internal Documentaion
  - From: Philip Van Hoof

References:
- Re: [Tracker] Tracker Internal Documentaion
  - From: Philip Van Hoof
- Re: [Tracker] Tracker Internal Documentaion
  - From: Vishesh Handa

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]