Re: Finding and Reminding, tech issues, 3.0 and beyond



On 09/04/10 23:09, Owen Taylor wrote:

Hi Owen,

Tracker
=======

In some testing, Tracker 0.8 seems enormously better behaved
than Tracker 0.6. It has very significant optimizations in how
it stores the tracker database on disk, and also, by default,
only indexes defined subdirs of $HOME. So, as of right now,
system-impact of Tracker isn't a big concern of mine, as it
would be for 0.6.

I should just add, we are not starting the performance optimisations. Up until now we have not really been focusing on this but more on feature completion.

Good to know that you think it is much more reasonable than 0.6 ;)

Possible concerns and considerations with Tracker:

  * RDF + SPARQL + a large collection of ontologies does present
    a significant new barrier to someone coming to the GNOME
    platform. While the basic concepts of RDF are quite simple,
    RDF serialization formats and SPARQL are new learning people
    will have to do, and there are some intimidating terms
    like "ontology"

I see your point with this. However, we tried using an API in 0.6 and it quickly got deprecated or needed updating with changes to the ontologies. The current approach allows much more flexibility. It is also much more powerful than an API. This was a common complaint from Nokia application developers for the n900. The fact that the 0.6 API was not powerful enough to query the way they wanted had to be addressed.

    RDF is also popularly (and perhaps unfairly) seen as
    yesterday's fad.

  * There is a large abstraction barrier between the application
    and the underlying data storage. It's very hard to decipher
    or influence how storing data in RDF and running SPARQL queries
    maps into low-level database operations.

Hmm, it is not so different to what we had in 0.6 (not that it is a good comparison :)

About mapping RDF/SPARQL to low level database operations, I can give you a quick idea right now:

  libtracker-client (wrapper around D-Bus + convenience functions)
    -> D-Bus (IPC)
      -> tracker-store (queue and event management daemon)
        -> libtracker-data (SPARQL <--> SQL)
          -> libtracker-db (database connections/functions)
            -> SQLite
              -> libtracker-fts (FTS module loaded by SQLite)

Schema/ontology wise, if you look at the database using the sqlite command line or sqlitebrowser UI, you can get a pretty good idea of how things are laid out. This is generated completely based on the files that describe the ontology in data/ontology (done in libtracker-{db|data})

  * Indexing only a subset of the filesystem, while it does
    avoid performance traps like indexing into large GIT
    repositories, could result in odd behavior from a user's
    point of view. If you edit a file in an unindexed part
    of your home directory, is it invisible when looking at
    your history?

Yes it will be. It has been suggested that we don't index source directories unless explicitly enabled in config. I think this would be a good idea to add at some point.

In the end the inotify limit is really what causes us the most problems. With the inclusion of FANotify¹ in newer distributions, we will add support and significantly improve things here.

On modern hardware (which I use) I don't even know when it is indexing, (other than perhaps initially when I see my disk use increase reasonably in the system-monitor applet).

    This may be partly satisfied by feeding accessed files
    into the Tracker indexed set file-by-file, either directly
    or via Zeitgeist.

We have a bug about this. I do plan to add support for such an API. I think Bastien submitted² it, it is a fair point, I think supporting an application fed API is useful especially for vendors that don't want the whole FS mining feature enabled by default. Also, as Bastien rightly points out, applications which update files know when they are updating a file and can let us know directly instead of us finding out via file system notifications. Both are important I would say.

  * Even when limiting Tracker to a subset of the home directory,
    it's likely still possible to run the system out of inotify
    handles.

True. This is completely subjective and depends on user and the inotify limit (which can change of course). I am of the opinion that people with > ~8k (default inotify limit on Ubuntu) directories are not such normal GNOME desktop users. Even with all my music/images/etc I am not going over that. I also have a lot of tracker test data.

That said, I don't think this is any defense and we want to overcome this with FANotify¹.

  * Using Tracker to extract and index metadata from files is
    pretty uncontroversial. Using Tracker as the primary store
    of information (such as tags) is more controversial - suddenly
    the user's data is dependent on the use of Tracker.

We agree. That's one of the reasons we created the journal to try to make Tracker much more robust in case of failure conditions.

Conclusions?
============

Not much yet - I think it will definitely be hard to implement
our ideas without something that looks a lot like Tracker, and
since we have Tracker something that looks a lot like Tracker
is most likely Tracker :-) Zeitgeist seems less centrally crucial,
but there is a role for event logging here.

Further UI design is definitely needed to figure out what we
can do short-term for Nautilus/GtkFileChooser, etc.

For event logging, I agree, currently Tracker is supported in both Nautilus and GtkFileChooser of course. We could improve things there (allowing more fine grained search options) of course.

¹ http://lwn.net/Articles/339253/
² https://bugzilla.gnome.org/show_bug.cgi?id=613252

--
Regards,
Martyn


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]