Re: [Tracker] tracker and RDF



Eyal Oren wrote:
Hi Jamie,

I've just caught up with the huge number of emails on d-d-l. It was a great discussion to follow, even though I feel you were being treated a bit harshly.

I expected to get a much harsher reaction from some of the Novel and pro-beagle lads but apart from a few grumblings from them it passed off quite well. My sources tell me tracker was generaly well received amongst Gnome devs.

I loved the discussion on metadata specs and RDF schema,
given that I am a semantic web researcher (see http://www.activerdf.org).

I see tracker as being a lot simpler, lighter and faster than a full blown sematic web thingy but of course anything we can learn from then would be useful so feel free to point things out :)


As you may remember, I tried to point to you earlier, that using RDF (triples) as metadata format would be much more flexible than fixed database tables, and using RDFS rules (such as subPropertyOf) would be great to allow applications to deal with new data without them needing to adjust their database queries).

yes it is more useful (but slower like that) so some tuning will be required

Now seeing the course of the discussion, allow me to pitch in here. You were discussing using librdf to do some RDF query answering. For your information, librdf does not do any RDFS reasoning!! librdf has one single table called "triple(s,p,o)" in which it stores all triples, and then does query rewriting from SPARQL or RDQL to this relational table. However, the algorithms for query rewriting do not consider any RDFS statements (e.g. evolution:workEmail subPropertyOf tracker:email) so librdf actually only provides "pure" RDF answers, without the benefits of RDFS.

yeah the performance of that looks terrible as well as being pretty horrendous with the sql


As James Hendringe pointed out, the flexibility of RDF (we only have triples) allow one to store arbitrary metadata, but querying with a naive implementation (one single relational table) quite slow, since the database is doing a self-join for each where clause.

I've recently worked on a simple RDF store based on sqlite3 (I call it rdflite). The incentive came from the buggy and complex state of existing RDF stores: I wanted something simple and lightweight that my users can easily deploy and use (hence the choice for sqlite3). In the course of building that, I've got quite some experience with query rewriting from rdf-to-sql.

we already have an rdf query implementation so that jus needs a bit of tinkering to support automatic searching of all child metadata types and its not difficult


for the record: my datastore does now also not do any RDFS reasoning, but that would not be too hard to implement: we need to adjust the query rewriting to take some RDFS rules into account. In the evolution example: if you ask for all emails, we need to rewrite the query, to also consider (in a union) all those properties that have been defined to be subproperties of email (in this case, work-emails).

Let me get to the point (sorry for the long story here): I'm very happy that Wouter Bolsterlee and others showed you the advantage of RDFS and RDF here, and I would be very very happy to explore it with you in the context of tracker. If you want to roll your own rdf solution, I can help you with my current rdf-in-sqlite experience; if you want to use librdf I can help with my experience of using it and programming against it; and if you want to brainstorm a bit about the (dis)advantages of rdf, I'm happy to do that too.

I cant use librdf directly as I said on d-d-l its too abstracted and not optimised.

Currently we use two tables for metadata storage (one for the service and one for the metadata value - these are further split into string/numeric tables and indexes) so to support triple store I should just need one more table inserted between the existing two - this will allow for metadata to have any number of values at the cost of slower inserts and perhaps a bit more disk space

For the metadata types themselves we already have a table for storing them and we just need one more flattened table to store the relationships between the various types. THis should in effect gives us the sub property type stuff.

Its not difficult to do - just a lot of work plumbing it in to our existing framework and altering all our stored procedures and queries.

We dont need any rdf syntax/magic here as its pretty simple stuff and we can expose a nice dbus api for managing the metadata types and their inter-relationships.

Our dbus api would also need to change so that "get metadata" returns an array instead of a single value as each entry can now have multiple metadata values etc. So unless there is anything I have missed its just elbow grease really :)

--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]