Re: Proposing Tracker for inclusion into GNOME 2.18



Hello Ross,

On Thu, 2006-10-26 at 08:48 +0100, Ross Burton wrote:
> There is very little difference between a database with three columns,
> and a true triplestore ("full semantic web thing") apart from the fact
> that the former has lots of RDF:s magic inherently available to it.
> Which, incidently, is already coded up in librdf.
> 
> As I said before, librdf contains database-backed triplestores, RDF
> query parses, and more.  All one needs to do is glue it together with an
> indexer.

Have you (or anyone else here) actually tested Redland (librdf) with one
of the database backends recently? Last time I checked, Redland wasn't
able to translate queries into SQL, which means queries are going to be
very slow on all but very small data sets. Redland's storage module
overview [1] seems to still imply this, since all database-based storage
modules are described as "Indexed but not optimized". AFAIK, this means
that queries (in RDQL or SPARQL) are not optimized, and that only simple
Redland patterns will be fast thanks to the database indexes.

The other thing I don't like about Redland or similar libraries like
Jena [2], is that they put a pretty thick abstraction layer between the
database and whatever interface you use to access the data, like, for
example, the query interface. They do that to be able to handle
different storage backends transparently, but this always has a cost in
efficiency. About a year ago, two colleagues of mine and I imported a
large RDF model [3] into Jena using MySQL as backend. We tried doing
queries through Jena (in RDQL at the time) and directly from MySQL by
just opening the database and issuing the queries by hand in SQL. SQL
was up to 100 times faster for some tasks.

I think for a dekstop metadata database, we need a really lightweight
solution, hopefully with just one, very well optimized storage backend.
My candidate would be SQLite because of its ease of configuration, but I
still don't really know how it performs when optimizing complex queries
(and believe me, SPARQL queries translated into SQL can become really,
really ugly...) In this respect, Tracker seems to be on the right path,
but I fear Jamie and his contributors must stop hacking for a short
while and dedicate some hours to read and understand the RDF
specifications. The specs don't tell you how to implement thinks
efficiently, but they handle a lot of the really difficult theoretical
stuff in a very nice and, above all, standard way.

Hope this helps,

M. S.

[1] http://librdf.org/docs/storage.html
[2] http://jena.sourceforge.net/
[3] http://www.xml.com/pub/r/967




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]