Re: Finding and Reminding, tech issues, 3.0 and beyond
- From: Martyn Russell <martyn lanedo com>
- To: Owen Taylor <otaylor redhat com>
- Cc: gnome-shell-list gnome org, desktop-devel-list gnome org
- Subject: Re: Finding and Reminding, tech issues, 3.0 and beyond
- Date: Mon, 12 Apr 2010 12:03:02 +0100
On 09/04/10 23:09, Owen Taylor wrote:
Hi Owen,
Tracker
=======
In some testing, Tracker 0.8 seems enormously better behaved
than Tracker 0.6. It has very significant optimizations in how
it stores the tracker database on disk, and also, by default,
only indexes defined subdirs of $HOME. So, as of right now,
system-impact of Tracker isn't a big concern of mine, as it
would be for 0.6.
I should just add, we are not starting the performance optimisations. Up
until now we have not really been focusing on this but more on feature
completion.
Good to know that you think it is much more reasonable than 0.6 ;)
Possible concerns and considerations with Tracker:
* RDF + SPARQL + a large collection of ontologies does present
a significant new barrier to someone coming to the GNOME
platform. While the basic concepts of RDF are quite simple,
RDF serialization formats and SPARQL are new learning people
will have to do, and there are some intimidating terms
like "ontology"
I see your point with this. However, we tried using an API in 0.6 and it
quickly got deprecated or needed updating with changes to the
ontologies. The current approach allows much more flexibility. It is
also much more powerful than an API. This was a common complaint from
Nokia application developers for the n900. The fact that the 0.6 API was
not powerful enough to query the way they wanted had to be addressed.
RDF is also popularly (and perhaps unfairly) seen as
yesterday's fad.
* There is a large abstraction barrier between the application
and the underlying data storage. It's very hard to decipher
or influence how storing data in RDF and running SPARQL queries
maps into low-level database operations.
Hmm, it is not so different to what we had in 0.6 (not that it is a good
comparison :)
About mapping RDF/SPARQL to low level database operations, I can give
you a quick idea right now:
libtracker-client (wrapper around D-Bus + convenience functions)
-> D-Bus (IPC)
-> tracker-store (queue and event management daemon)
-> libtracker-data (SPARQL <--> SQL)
-> libtracker-db (database connections/functions)
-> SQLite
-> libtracker-fts (FTS module loaded by SQLite)
Schema/ontology wise, if you look at the database using the sqlite
command line or sqlitebrowser UI, you can get a pretty good idea of how
things are laid out. This is generated completely based on the files
that describe the ontology in data/ontology (done in libtracker-{db|data})
* Indexing only a subset of the filesystem, while it does
avoid performance traps like indexing into large GIT
repositories, could result in odd behavior from a user's
point of view. If you edit a file in an unindexed part
of your home directory, is it invisible when looking at
your history?
Yes it will be. It has been suggested that we don't index source
directories unless explicitly enabled in config. I think this would be a
good idea to add at some point.
In the end the inotify limit is really what causes us the most problems.
With the inclusion of FANotify¹ in newer distributions, we will add
support and significantly improve things here.
On modern hardware (which I use) I don't even know when it is indexing,
(other than perhaps initially when I see my disk use increase reasonably
in the system-monitor applet).
This may be partly satisfied by feeding accessed files
into the Tracker indexed set file-by-file, either directly
or via Zeitgeist.
We have a bug about this. I do plan to add support for such an API. I
think Bastien submitted² it, it is a fair point, I think supporting an
application fed API is useful especially for vendors that don't want the
whole FS mining feature enabled by default. Also, as Bastien rightly
points out, applications which update files know when they are updating
a file and can let us know directly instead of us finding out via file
system notifications. Both are important I would say.
* Even when limiting Tracker to a subset of the home directory,
it's likely still possible to run the system out of inotify
handles.
True. This is completely subjective and depends on user and the inotify
limit (which can change of course). I am of the opinion that people with
> ~8k (default inotify limit on Ubuntu) directories are not such normal
GNOME desktop users. Even with all my music/images/etc I am not going
over that. I also have a lot of tracker test data.
That said, I don't think this is any defense and we want to overcome
this with FANotify¹.
* Using Tracker to extract and index metadata from files is
pretty uncontroversial. Using Tracker as the primary store
of information (such as tags) is more controversial - suddenly
the user's data is dependent on the use of Tracker.
We agree. That's one of the reasons we created the journal to try to
make Tracker much more robust in case of failure conditions.
Conclusions?
============
Not much yet - I think it will definitely be hard to implement
our ideas without something that looks a lot like Tracker, and
since we have Tracker something that looks a lot like Tracker
is most likely Tracker :-) Zeitgeist seems less centrally crucial,
but there is a role for event logging here.
Further UI design is definitely needed to figure out what we
can do short-term for Nautilus/GtkFileChooser, etc.
For event logging, I agree, currently Tracker is supported in both
Nautilus and GtkFileChooser of course. We could improve things there
(allowing more fine grained search options) of course.
¹ http://lwn.net/Articles/339253/
² https://bugzilla.gnome.org/show_bug.cgi?id=613252
--
Regards,
Martyn
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]