Re: [Tracker] [PATCH] "Daemonize" metadata extractor

From: "Mikkel Kamstrup Erlandsen" <mikkel kamstrup gmail com>
To: "Jamie McCracken" <jamiemcc blueyonder co uk>
Cc: tracker-list gnome org
Subject: Re: [Tracker] [PATCH] "Daemonize" metadata extractor
Date: Sat, 1 Mar 2008 14:16:54 +0100

On 29/02/2008, Jamie McCracken <jamiemcc blueyonder co uk> wrote:

On Fri, 2008-02-29 at 11:34 +0100, Carlos Garnacho wrote:
> Hi!,
>
> I've attached a patch in bug #519337 to keep the extractor alive between
> operations. This greatly improves performance, as it avoids having to
> spawn/initialize the extractor constantly for each new file. With the
> patch, the extractor shuts down by itself after 30 seconds of
> inactivity, any testing is appreciated.
>
> Besides, I've been thinking a bit in this subject. Right now trackerd
> waits synchronously for the metadata extractor output (and the same
> happens for thumbnailing, even when such data isn't immediately
> necessary), so only 1 file is processed at the same time.
>
> Has there been any thinking/work on making that parallelizable? I'm sure
> there'd be performance improvements if there was a pool of extractors
> which asynchronously processed a queue of filenames.
>

yeah although its tricky with threads (synchronisation and deadlock
issues)

The plan for 0.7 is to split trackerd into :

1) Always active main daemon that does watching and processes search
requests

2) tracker-file-indexer - called by (1) via dbus to index files. Nice
+19 and ioniced. Exits when indeixng complete. Dbus activated when
crashed or new stuff to index comes about

3) tracker-email-indexer - called by (1) to index emails. same as (2).
File attachemnts would need to be handled by similar code to (1) which
is disadvantageous though

4) xesam extractors - some extractors can be built into (1) and (2) so
as to become a daemonised extractor others will be specified by xesam
and called out of process by (1)

5) xesam crawlers - as (4) but for containerised objects like news feeds

The above would be faster and much more leaner on memory as memory
consumed by indexing would be released when indexing has finished. It
should be more maintainable and less complex than a monolithic trackerd

there would also need to be private shared libs for the above components
to enhance code reuse

the xesam stuff would easily allow 3rd party extractors and crawlers to
be implemented

anyway to cut a long story short, daemonizing tracker-extract is not the
way to go but rather to embed common and reliable (Eg not crash prone)
formats in a tracker-file-indexer daemon. It should use dbus of course
for flexibility. It could be threaded as it would be less complex than
trackerd is at the moment

Designing the above will be tricky but should go hand in hand with
refactoring. If thats somehting you or others want to work on then we
should discuss on IRC

Jamie if you have more in depth design ideas it would be a good idea to post them on the Xesam ml. Specifically about the shared Xesam metadata extractors and crawlers. There has not been much concrete discussion on these topics.

Cheers,
Mikkel

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]