Re: Finding and Reminding, tech issues, 3.0 and beyond

On Fri, 2010-04-09 at 18:09 -0400, Owen Taylor wrote:

> Tracker
> =======
> In some testing, Tracker 0.8 seems enormously better behaved
> than Tracker 0.6. It has very significant optimizations in how
> it stores the tracker database on disk, and also, by default,
> only indexes defined subdirs of $HOME. So, as of right now,
> system-impact of Tracker isn't a big concern of mine, as it
> would be for 0.6.
> Possible concerns and considerations with Tracker:
>  * RDF + SPARQL + a large collection of ontologies does present
>    a significant new barrier to someone coming to the GNOME
>    platform. While the basic concepts of RDF are quite simple,
>    RDF serialization formats and SPARQL are new learning people
>    will have to do, and there are some intimidating terms
>    like "ontology"

ontology = schema, so just semantics really :)

>    RDF is also popularly (and perhaps unfairly) seen as
>    yesterday's fad.

RDF is indeed not as nice or succinct as it could be but what would the
alternative be?

Also bear in mind that nepomuk ontology, which tracker uses, is shared
with Kde and hence it should result in some nice freedesktop coperation
and allow tracker to meet the needs of KDe apps as well as Gnome.

It would also be unwise to store shareable metadata without some
onto/schema which all apps can agree on

>  * There is a large abstraction barrier between the application
>    and the underlying data storage. It's very hard to decipher
>    or influence how storing data in RDF and running SPARQL queries
>    maps into low-level database operations.

Not sure why that is relevant?

FYI, resources are stored as individual tables and all properties of a
resource are fields in those tables so the end result would not be much
different from a traditional sql database. 

Indeed its nothing like an off the shelf triple store which stores
individual properties as rows in a gigantic table which has poor
scalability (although great extensibility). Ergo tracker should be seen
as an optimised SQL database but which uses RDF/Sparql as its table
schema and query language  rather than SQL

> Zeitgeist
> =========
> The "properties of files" approach of Tracker works for a lot
> of things. However, it is pretty much unsuitable for storing
> time-based histories of actions. We can store the last time
> a file was edited as a Tracker property. It's slightly harder
> to store all the times the file was edited. It's considerably
> harder to store all the times the file was edited including
> the editing application for each access.
> (Of course, anything can be stored in RDF; it's a perfectly
> general format; however, the more that we have to create
> anonymous nodes, the more different structures that we are
> storing in the tracker triple store, the harder it is going
> to be to optimize, and the less suitable a straightforward
> implemention of the triple-store backed by a sqlite database
> is.)
> My understanding is that the Tracker people have disclaimed
> the log storage problem. 

Not really. Storing timeline info is not a big deal for tracker. just
like a file can have many tags in tracker, it could also have many
histories or audit trails. It could also simply be just a multi-value
date property if all you stored was the datetime stamp

We would want a timeline ontology to be part of nepomuk if possible so
discussions with them would be needed first. Failing that, a tracker
specific timeline property could easily be added to all objects

tracker is definitely the right place to add timeline info if you intend
to do queries like "get me all music files I played last week" or "get
me all documents I viewed recently with author blah". I have heard of
one project which uses tracker to get data but then uses zeitgeist to
filter it for timeline info which is clearly not a good solution and
wont scale if the tracker results were huge

General Event logging could also be added to tracker but its usefulness
is not as great as say timelining and we dont really want to stamp on
the zeitgeist teams feet at this point. In the future, zeitgeist may
well decide to use tracker as an event logging framework but it is their
decision at end of the day


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]