Re: [Tracker] Tracker Internal Documentaion
- From: Philip Van Hoof <philip codeminded be>
- To: Vishesh Handa <me vhanda in>
- Cc: Tracker mailing list <tracker-list gnome org>
- Subject: Re: [Tracker] Tracker Internal Documentaion
- Date: Thu, 11 Apr 2013 14:29:21 +0200
On Thu, 2013-04-11 at 15:53 +0530, Vishesh Handa wrote:
I'm curious as to how you handle -
1. Type Inference - Say something like this 'select ?r where { ?r a
nco:Contact . }. Lets say one has some 10 nco:Contacts and some 15
nco:PersonContacts. In this case one would have to iterate over both
the tables. Does tracker do that?
This gets translated to something like
SELECT Uri FROM "nco:Contact", to translate from nco:Contact to the SQL
table "nco:Contact" is something Tracker does internally. I think there
is a environment variable that you can turn on to print the SQLite
statements the first time they are prepared (there is a LRU cache of
SQLite statements). These statements include the query.
Type interference itself (select ?p ?o { nco:Contact ?p ?o }) works
because Tracker stores its own ontology in itself, and an ontology are
just a bunch of rdf statements.
2. Property Inference - Do you handle cases such as 'select ?r ?l
where { ?r nao:prefLabel ?l . }', where the 'nao:prefLabel' has not
been explicitly defined. Lets assume that the 'nie:title' has been
set.
The nie:title is a rdfs:subPropertyOf nao:prefLabel
I don't remember.
Does tracker handle cases like this? Cause this is a rather common
usecase in Nepomuk where we want to fetch a good label for the
resource and we do not want to query specific properties.
Afaik yes.
3. I read that you have some support for graphs - How is that
implemented? From what I understand from your db schema each property
has their own column, so I'm not sure where you would store the graph
related info.
The graph support is limited. More explanation on its limitations here:
https://live.gnome.org/Tracker/Documentation/SparqlFeatures#Named_Graphs
Also, does tracker use graphs for any purpose?
Only for storing the origin of a statement (which was the only required
use-case for the N9, which we were mainly targeting while developing
Tracker's SPARQL endpoint and Nepomuk ontology support).
In the Nepomuk KDE world we generally use graphs to group triples
based on which application has stored the information. This is
especially useful in the case of indexing. When a file has been
modified and needs to be re-indexed, we need to throw away the
previous data and re-index it. The file could in this case have both
indexed and non-indexed data such as tags and ratings. So, we only
remove the statements that were added by the indexer and then reindex
the file.
This isn't supported by Tracker.
This seems like a very common use case. I'm curious as to how tracker
solves this problem.
It doesn't. Full graph support was not a design consideration, only
limited support for it was.
Kind regards,
Philip
Then the libtrackersparql makes a WAL SQLite connection,
parses your
SPARQL and generates on the fly SQL for that. This happens
often using
subqueries, and without building an AST first - making the
parse-translate phase relatively fast and resource friendly,
which is a
design-choice as Tracker is indented to run on devices with
few
resources.
This code does that:
https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-sparql-query.vala
That design-choice of course has a draw back in that the
queries have to
manually optimized very often and/or to get things fast enough
we often
had to store data in a rendundant way. We did this with domain
specific
indexes which I explained in this blog item:
http://pvanhoof.be/blog/index.php/2010/07/07/domain-indexes-finished-technical-conclusions
http://pvanhoof.be/blog/index.php/2010/07/03/sqlites-wal-deleting-a-domain-specific-index
http://pvanhoof.be/blog/index.php/2010/07/01/working-on-domain-specific-indexes
This function is the code that translates a statement (from
the SPARQL
UPDATE we make RDF statements and then we process those) to
SQL inserts:
https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-data-update.c#n732
A lot of the SPARQL UPDATE part is here:
https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-sparql-pattern.vala
Note that we have a buffer where we eliminate duplicates like:
<a> a Class.
<a> Prop1 Value1.
<a> a Class.
<a> Prop2 Value2.
<a> Prop1 Value3.
We translate that to:
<a> a Class; Prop1 Value; Prop2 Value3.
Except when we utilize the null support:
http://pvanhoof.be/blog/index.php/2011/08/09/support-for-null-with-trackers-insert-or-replace-feature
http://pvanhoof.be/blog/index.php/2011/08/15/null-support-for-insert-or-replace-available-in-master
The second blog item and my own comments in the first
illustrate the
problem with optimizing statement-sets like above just like
that.
Then we also added a few SQLite functions which are used by
the
SPARQL->SQL translation and for most of our own SPARQL
extensions (we
ship with a bunch of SPARQL extensions that allow query
writers to make
queries faster):
https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-db-interface-sqlite.c#n826
The data-manager is mostly about creating the SQL tables and
preparation. It can also handle a limited amount of ontology
changes
(adding and removing of classes and properties and their
extensions like
domain specific indexes):
https://git.gnome.org/browse/tracker/tree/src/libtracker-data/tracker-data-manager.c
Thanks for all this information
Kind regards,
Philip
On Thu, 2013-04-11 at 05:18 +0530, Vishesh Handa wrote:
> Hey Philip
>
> I'm one of the KDE Nepomuk developers. I've been looking
into the
> tracker project for some time now, since it's good to know
how other
> people internally implement things. However, it has been
very hard for
> me to find any documentation on the inner working of
tracker.
>
>
> I'm specifically interested in how the database schema is
designed. I
> did find this "Semantic Social Desktop and Mobile Devices"
> presentation [1] which gives a very rough overview of how
each class
> has its own table.
>
>
> Could you perhaps point me to some internal documentation?
It would be
> most helpful. Otherwise, could I ask your some detailed
questions
> about the tracker internals?
>
> I have looked at the source code, but it's a little hard to
understand
> for a new comer.
>
> [1]
https://live.gnome.org/Tracker/Documentation/Presentations
>
> --
> Vishesh Handa
>
--
Vishesh Handa
--
Philip Van Hoof
Software developer
Codeminded BVBA - http://codeminded.be
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]