Re: Proposing Tracker for inclusion into GNOME 2.18



Jos van den Oever wrote:
Hi all,

Hi Jos, great to have you in on this discussion.


Strigi has a few features that are not in Tracker or Beagle and misses
a number of features that the other programs lack. But the core
functionality of Strigi, indexing data, is something that it shares.
One important distinction has to be made straightaway: the difference
between indexing metadata and storing metadata. Strigi only indexes
metadata. If you think you're disk is full, you can just throw away
the index, because there is no data of value in there. All that's in
there is an index that allows you to find your data quickly.
Personally, I think _storing_ metadata in an indexer is not a good
idea. (I do think that an index on a metadata store is a good idea,
but that's a different matter). This is a large difference with
Tracker which does act as a metadata store of 'first class objects'
whatever that means. Beagle is also mainly an index. (Is any
non-redundant data lost if I delete my Beagle index, Joe?)

First to clarify, tracker is not a dedicated indexer (like Beagle and Strigi) but is first and foremost a database which has indexing as a side feature.

Our metadata store (sqlite) is quite separate from our full text indexer (QDBM) which can be deleted if not required - the data there is just as expendable as in Strigi's and Beagle's case. No metadata is "stored" in the full text indexer although indexable metadata is of course indexed in it.

Tracker can also be run as a stand alone metadata store/server without any indexing if desired (with the --disable-indexing command line option)

[snip]

So it this is not a sales talk, what is it? It's a call for
standardization. This discussion between competing programs is a great
time to start talking about common functionality. With regards to
desktop search there are many things that can be standardized:
- query language
- metadata names and meaning
- test suites
- DBus APIs
- index formats

I won't discuss index formats because, even though Beagle and Strigi
both use the Lucene index format, this is an implementation detail and
defines performance and disk usage and should not be frozen into a
standard.

The query language as used by Beagle and Strigi is very similar (no
coincidence) and is a good start for standardization. The largest
drawback of the language used is the ambiguity of the field
specifiers.

Now that DBus v1 is almost upon is, the barriers between GNOME and KDE
are diminishing. Functionality defined by a DBus API can by
implemented in any language and as such, I think GNOME should choose a
DBus API to use and share with KDE and

yes this is my desire also.


Test suites. I'd love there to be a common test suite that says: if
you index this data with these parameters, you should get these
results from this query. Strigi will develop such test naturally.
Being able to share them across projects would mean that programs
would compete on merit and without the usual prejudices and license
and library incompatibilities.
Strigi has a DBus interface for searching, so does Tracker. We should
compare them and find a common interface. Of course the respective
GNOME and KDE developers should decide which DBus API should be used
by their applications. Freedesktop.org would be a good place to define
these interfaces.

we should have a org.freedesktop.indexer interface that we can all share. Implementation specific stuff can then reside in their own unique interfaces


Metadata naming and meaning. This is something which is rather hard.
Dublin Core is part of it. It names some types of metadata. I've
already mailed about this with Jamie in the past . In my opionion, the
issue should be separated into smaller definitions that say, what
metadata fields can be extracted from certain filetypes. Indexer
plugins could then advertise that they implement this functionality.
The names for the metadata names should also be used when searching
and there, for convenience, they should be abbreviated as is current
practice.

So, rather a long mail that can be summarized in: please accept an API
for searching and not a suit of programs (indexer + guis to it) and
start thinking about standardizing _indexable_ metadata (other
metadata is a whole different can of worms that I wont touch). This is
still possible since neither KDE nor GNOME have agreed on a program
for indexing and by adopting only an API, programs will be forced to
collaborate to adhere to the API as good as possible, meaning the user
wins.

I agree from the indexing point of view but Gnome requires a reference implementation to be available - in cases where there have been multiple cases, Gnome has always blessed one (EG Epiphany vs Galeon) but that does not mean distros use the blessed one (EG Firefox is more likely to be used as the dominant web browser even though I think Epiphany is better in a Gnome setting)

The other (somewhat unique) features of tracker - desktop wide tagging, extensible metadata etc are still vital ingredients for Gnome and thats one of the other reasons for proposing tracker and we need it to be more integrated if Gnome is to become more integrated in this regard.

We also have big problems with lots of #ifdef'ing in code so standardising would be a big win. Im sure when I try and implement Epiphany's next generation bookmark/history stuff into tracker's first class object database they would prefer it not to be #ifdef'ed?

So with tracker being able to be used as a standalone metadata store without any indexing there shouldn't be a need to confine what goes into gnome to just pure indexing but could leave the door open to : just tracker or tracker+Beagle or tracker+strigi with the latter cases taking ownership of the shared indexing dbus interface and tracker confined to metadata storage only

Some people might not like that but I think its a practical compromise. With tracker being the only one written in pure C it is therefore the only one that can *ultimately* get into the Gnome platform and be fully integrated (at the moment I am just proposing it for desktop which is just a simple blessing nothing more).

I hope having a shared interface for the pure indexing case will solve the concerns other indexers have and allow us to integrate tracker otherwise we risk restricting innovation and integration with a pure indexing solution which would mean we miss out on the more exciting features of tracker and their usefulness to Epiphany and other apps.


--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]