Re: [Tracker] tracker and RDF

From: Jamie McCracken <jamiemcc blueyonder co uk>
To: Eyal Oren <eyal oren deri org>
Cc: Tracker List <tracker-list gnome org>
Subject: Re: [Tracker] tracker and RDF
Date: Tue, 31 Oct 2006 01:11:39 +0000

Eyal Oren wrote:

Hi Jamie,
I've just caught up with the huge number of emails on d-d-l. It was agreat discussion to follow, even though I feel you were being treated abit harshly.

I expected to get a much harsher reaction from some of the Novel andpro-beagle lads but apart from a few grumblings from them it passed offquite well. My sources tell me tracker was generaly well receivedamongst Gnome devs.


I loved the discussion on metadata specs and RDF schema,

given that I am a semantic web researcher (see http://www.activerdf.org).

I see tracker as being a lot simpler, lighter and faster than a fullblown sematic web thingy but of course anything we can learn from thenwould be useful so feel free to point things out :)

As you may remember, I tried to point to you earlier, that using RDF(triples) as metadata format would be much more flexible than fixeddatabase tables, and using RDFS rules (such as subPropertyOf) would begreat to allow applications to deal with new data without them needingto adjust their database queries).


yes it is more useful (but slower like that) so some tuning will be required

Now seeing the course of the discussion, allow me to pitch in here. Youwere discussing using librdf to do some RDF query answering. For yourinformation, librdf does not do any RDFS reasoning!! librdf has onesingle table called "triple(s,p,o)" in which it stores all triples, andthen does query rewriting from SPARQL or RDQL to this relational table.However, the algorithms for query rewriting do not consider any RDFSstatements (e.g. evolution:workEmail subPropertyOf tracker:email) solibrdf actually only provides "pure" RDF answers, without the benefitsof RDFS.

yeah the performance of that looks terrible as well as being prettyhorrendous with the sql

As James Hendringe pointed out, the flexibility of RDF (we only havetriples) allow one to store arbitrary metadata, but querying with anaive implementation (one single relational table) quite slow, since thedatabase is doing a self-join for each where clause.
I've recently worked on a simple RDF store based on sqlite3 (I call itrdflite). The incentive came from the buggy and complex state ofexisting RDF stores: I wanted something simple and lightweight that myusers can easily deploy and use (hence the choice for sqlite3). In thecourse of building that, I've got quite some experience with queryrewriting from rdf-to-sql.

we already have an rdf query implementation so that jus needs a bit oftinkering to support automatic searching of all child metadata types andits not difficult

for the record: my datastore does now also not do any RDFS reasoning,but that would not be too hard to implement: we need to adjust the queryrewriting to take some RDFS rules into account. In the evolutionexample: if you ask for all emails, we need to rewrite the query, toalso consider (in a union) all those properties that have been definedto be subproperties of email (in this case, work-emails).
Let me get to the point (sorry for the long story here): I'm very happythat Wouter Bolsterlee and others showed you the advantage of RDFS andRDF here, and I would be very very happy to explore it with you in thecontext of tracker. If you want to roll your own rdf solution, I canhelp you with my current rdf-in-sqlite experience; if you want to uselibrdf I can help with my experience of using it and programming againstit; and if you want to brainstorm a bit about the (dis)advantages ofrdf, I'm happy to do that too.

I cant use librdf directly as I said on d-d-l its too abstracted and notoptimised.

Currently we use two tables for metadata storage (one for the serviceand one for the metadata value - these are further split intostring/numeric tables and indexes) so to support triple store I shouldjust need one more table inserted between the existing two - this willallow for metadata to have any number of values at the cost of slowerinserts and perhaps a bit more disk space

For the metadata types themselves we already have a table for storingthem and we just need one more flattened table to store therelationships between the various types. THis should in effect gives usthe sub property type stuff.

Its not difficult to do - just a lot of work plumbing it in to ourexisting framework and altering all our stored procedures and queries.

We dont need any rdf syntax/magic here as its pretty simple stuff andwe can expose a nice dbus api for managing the metadata types and theirinter-relationships.

Our dbus api would also need to change so that "get metadata" returns anarray instead of a single value as each entry can now have multiplemetadata values etc. So unless there is anything I have missed its justelbow grease really :)


--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]