Re: [Tracker] tracker and RDF
- From: Jamie McCracken <jamiemcc blueyonder co uk>
- To: Eyal Oren <eyal oren deri org>
- Cc: Tracker List <tracker-list gnome org>
- Subject: Re: [Tracker] tracker and RDF
- Date: Tue, 31 Oct 2006 01:11:39 +0000
Eyal Oren wrote:
Hi Jamie,
I've just caught up with the huge number of emails on d-d-l. It was a
great discussion to follow, even though I feel you were being treated a
bit harshly.
I expected to get a much harsher reaction from some of the Novel and
pro-beagle lads but apart from a few grumblings from them it passed off
quite well. My sources tell me tracker was generaly well received
amongst Gnome devs.
I loved the discussion on metadata specs and RDF schema,
given that I am a semantic web researcher (see http://www.activerdf.org).
I see tracker as being a lot simpler, lighter and faster than a full
blown sematic web thingy but of course anything we can learn from then
would be useful so feel free to point things out :)
As you may remember, I tried to point to you earlier, that using RDF
(triples) as metadata format would be much more flexible than fixed
database tables, and using RDFS rules (such as subPropertyOf) would be
great to allow applications to deal with new data without them needing
to adjust their database queries).
yes it is more useful (but slower like that) so some tuning will be required
Now seeing the course of the discussion, allow me to pitch in here. You
were discussing using librdf to do some RDF query answering. For your
information, librdf does not do any RDFS reasoning!! librdf has one
single table called "triple(s,p,o)" in which it stores all triples, and
then does query rewriting from SPARQL or RDQL to this relational table.
However, the algorithms for query rewriting do not consider any RDFS
statements (e.g. evolution:workEmail subPropertyOf tracker:email) so
librdf actually only provides "pure" RDF answers, without the benefits
of RDFS.
yeah the performance of that looks terrible as well as being pretty
horrendous with the sql
As James Hendringe pointed out, the flexibility of RDF (we only have
triples) allow one to store arbitrary metadata, but querying with a
naive implementation (one single relational table) quite slow, since the
database is doing a self-join for each where clause.
I've recently worked on a simple RDF store based on sqlite3 (I call it
rdflite). The incentive came from the buggy and complex state of
existing RDF stores: I wanted something simple and lightweight that my
users can easily deploy and use (hence the choice for sqlite3). In the
course of building that, I've got quite some experience with query
rewriting from rdf-to-sql.
we already have an rdf query implementation so that jus needs a bit of
tinkering to support automatic searching of all child metadata types and
its not difficult
for the record: my datastore does now also not do any RDFS reasoning,
but that would not be too hard to implement: we need to adjust the query
rewriting to take some RDFS rules into account. In the evolution
example: if you ask for all emails, we need to rewrite the query, to
also consider (in a union) all those properties that have been defined
to be subproperties of email (in this case, work-emails).
Let me get to the point (sorry for the long story here): I'm very happy
that Wouter Bolsterlee and others showed you the advantage of RDFS and
RDF here, and I would be very very happy to explore it with you in the
context of tracker. If you want to roll your own rdf solution, I can
help you with my current rdf-in-sqlite experience; if you want to use
librdf I can help with my experience of using it and programming against
it; and if you want to brainstorm a bit about the (dis)advantages of
rdf, I'm happy to do that too.
I cant use librdf directly as I said on d-d-l its too abstracted and not
optimised.
Currently we use two tables for metadata storage (one for the service
and one for the metadata value - these are further split into
string/numeric tables and indexes) so to support triple store I should
just need one more table inserted between the existing two - this will
allow for metadata to have any number of values at the cost of slower
inserts and perhaps a bit more disk space
For the metadata types themselves we already have a table for storing
them and we just need one more flattened table to store the
relationships between the various types. THis should in effect gives us
the sub property type stuff.
Its not difficult to do - just a lot of work plumbing it in to our
existing framework and altering all our stored procedures and queries.
We dont need any rdf syntax/magic here as its pretty simple stuff and
we can expose a nice dbus api for managing the metadata types and their
inter-relationships.
Our dbus api would also need to change so that "get metadata" returns an
array instead of a single value as each entry can now have multiple
metadata values etc. So unless there is anything I have missed its just
elbow grease really :)
--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]