Re: Finding and Reminding, tech issues, 3.0 and beyond

From: Martyn Russell <martyn lanedo com>
To: Owen Taylor <otaylor redhat com>
Cc: gnome-shell-list gnome org, desktop-devel-list gnome org
Subject: Re: Finding and Reminding, tech issues, 3.0 and beyond
Date: Mon, 12 Apr 2010 12:03:02 +0100

On 09/04/10 23:09, Owen Taylor wrote:

Hi Owen,

Tracker
=======

In some testing, Tracker 0.8 seems enormously better behaved
than Tracker 0.6. It has very significant optimizations in how
it stores the tracker database on disk, and also, by default,
only indexes defined subdirs of $HOME. So, as of right now,
system-impact of Tracker isn't a big concern of mine, as it
would be for 0.6.

I should just add, we are not starting the performance optimisations. Upuntil now we have not really been focusing on this but more on featurecompletion.


Good to know that you think it is much more reasonable than 0.6 ;)

Possible concerns and considerations with Tracker:

  * RDF + SPARQL + a large collection of ontologies does present
    a significant new barrier to someone coming to the GNOME
    platform. While the basic concepts of RDF are quite simple,
    RDF serialization formats and SPARQL are new learning people
    will have to do, and there are some intimidating terms
    like "ontology"

I see your point with this. However, we tried using an API in 0.6 and itquickly got deprecated or needed updating with changes to theontologies. The current approach allows much more flexibility. It isalso much more powerful than an API. This was a common complaint fromNokia application developers for the n900. The fact that the 0.6 API wasnot powerful enough to query the way they wanted had to be addressed.

    RDF is also popularly (and perhaps unfairly) seen as
    yesterday's fad.

  * There is a large abstraction barrier between the application
    and the underlying data storage. It's very hard to decipher
    or influence how storing data in RDF and running SPARQL queries
    maps into low-level database operations.

Hmm, it is not so different to what we had in 0.6 (not that it is a goodcomparison :)

About mapping RDF/SPARQL to low level database operations, I can giveyou a quick idea right now:


  libtracker-client (wrapper around D-Bus + convenience functions)
    -> D-Bus (IPC)
      -> tracker-store (queue and event management daemon)
        -> libtracker-data (SPARQL <--> SQL)
          -> libtracker-db (database connections/functions)
            -> SQLite
              -> libtracker-fts (FTS module loaded by SQLite)

Schema/ontology wise, if you look at the database using the sqlitecommand line or sqlitebrowser UI, you can get a pretty good idea of howthings are laid out. This is generated completely based on the filesthat describe the ontology in data/ontology (done in libtracker-{db|data})

  * Indexing only a subset of the filesystem, while it does
    avoid performance traps like indexing into large GIT
    repositories, could result in odd behavior from a user's
    point of view. If you edit a file in an unindexed part
    of your home directory, is it invisible when looking at
    your history?

Yes it will be. It has been suggested that we don't index sourcedirectories unless explicitly enabled in config. I think this would be agood idea to add at some point.

In the end the inotify limit is really what causes us the most problems.With the inclusion of FANotify¹ in newer distributions, we will addsupport and significantly improve things here.

On modern hardware (which I use) I don't even know when it is indexing,(other than perhaps initially when I see my disk use increase reasonablyin the system-monitor applet).

    This may be partly satisfied by feeding accessed files
    into the Tracker indexed set file-by-file, either directly
    or via Zeitgeist.

We have a bug about this. I do plan to add support for such an API. Ithink Bastien submitted² it, it is a fair point, I think supporting anapplication fed API is useful especially for vendors that don't want thewhole FS mining feature enabled by default. Also, as Bastien rightlypoints out, applications which update files know when they are updatinga file and can let us know directly instead of us finding out via filesystem notifications. Both are important I would say.

  * Even when limiting Tracker to a subset of the home directory,
    it's likely still possible to run the system out of inotify
    handles.

True. This is completely subjective and depends on user and the inotifylimit (which can change of course). I am of the opinion that people with> ~8k (default inotify limit on Ubuntu) directories are not such normalGNOME desktop users. Even with all my music/images/etc I am not goingover that. I also have a lot of tracker test data.

That said, I don't think this is any defense and we want to overcomethis with FANotify¹.

  * Using Tracker to extract and index metadata from files is
    pretty uncontroversial. Using Tracker as the primary store
    of information (such as tags) is more controversial - suddenly
    the user's data is dependent on the use of Tracker.

We agree. That's one of the reasons we created the journal to try tomake Tracker much more robust in case of failure conditions.

Conclusions?
============

Not much yet - I think it will definitely be hard to implement
our ideas without something that looks a lot like Tracker, and
since we have Tracker something that looks a lot like Tracker
is most likely Tracker :-) Zeitgeist seems less centrally crucial,
but there is a role for event logging here.

Further UI design is definitely needed to figure out what we
can do short-term for Nautilus/GtkFileChooser, etc.

For event logging, I agree, currently Tracker is supported in bothNautilus and GtkFileChooser of course. We could improve things there(allowing more fine grained search options) of course.


¹ http://lwn.net/Articles/339253/
² https://bugzilla.gnome.org/show_bug.cgi?id=613252

--
Regards,
Martyn

Follow-Ups:
- Re: Finding and Reminding, tech issues, 3.0 and beyond
  - From: Martyn Russell

References:
- Finding and Reminding, tech issues, 3.0 and beyond
  - From: Owen Taylor

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]