Re: [Tracker] Revisiting indexer/daemon architecture



Acutually I have a meeting on friday so tomorrow at same time would be
best for me

jamie

On Wed, 2009-03-11 at 11:13 -0400, Jamie McCracken wrote:
Hi Carlos,

there are indeed advantages and disadvantages to doing things
differently

Im real busy today and tomorrow but could we have an IRC meet on Friday
at usual time 14:00 GMT (9am for me) to discuss this

jamie

On Tue, 2009-03-10 at 19:56 +0100, Carlos Garnacho wrote:
Hi all,

We (As in the contributors from Nokia) have been revisiting the idea of
the daemon/indexer architecture, which has proven to be a much better
approach, both conceptually and code-wise, however we still feel there
are some issues left.

Current situation
=================
Tracker is now split in daemon (which handles requests, does file
monitoring, etc...) and the indexer (which collects metadata from files
and saves them in the tracker database). Being as is, the indexer
doesn't need many reasons to stay alive after indexing has been
performed, and file operations (which aren't that usual in the normal
user interaction) would trigger again the indexer to do its job.

What we think could be the future
=================================
Metadata could come from virtually any source (Emails, RSS feeds, ...),
and it could also come at any time, not necessarily requiring user
interaction, and the more metadata sources, the busier Tracker would
get. This makes the separation between indexer and daemon less useful,
since the indexer part would be more and more time active, as it's also
responsible of writing data to databases.

Also, as the indexer is told to handle more information, the DBus
communication between both processes might become an issue in this
scenario.

Solution
========
We feel it would be better to have the write functionality of the
indexer back to the daemon, and make it just hand the metadata
information to the daemon so it's able to store it.

As for the communication medium between the daemon and the indexer, it
has to be compact and fast, so sending turtle files through a pipe
sounds quite optimal.

This architecture change would also ensure that the part that is
responsible of storing metadata and monitor external sources is always
kept alive.

So summing up, these would be the roles:

trackerd:
     - Database read and writes
     - File system monitoring
     - Importing metadata to the database using the TTL file format
     - handle remote/virtual data

tracker-indexer:
     - File system crawling
     - Exporting metadata to the TTL file format for the daemon

This solution would imply (again) plenty of changes in the code base,
and would take some time to have it ready for inclusion in trunk, but we
think it could be beneficial enough to be worth the pain, some pros
could be:

    - More efficient communication than DBus.
    - Overall memory footprint is smaller. There's no need to duplicate
strings on indexer/daemon sides for every file we crawl.
    - Faster response times for user requests when inserting and then
looking up virtual data (based on various ontologies).
    - Much more light weight daemon than we have now.
    - Less code duplication (i.e. indexer/daemon crawling).

Opinions? Issues?

Regards,
   Carlos



_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]