Re: [Tracker] [PATCH] "Daemonize" metadata extractor
- From: Carlos Garnacho <carlos imendio com>
- To: Jamie McCracken <jamiemcc blueyonder co uk>
- Cc: tracker-list gnome org
- Subject: Re: [Tracker] [PATCH] "Daemonize" metadata extractor
- Date: Tue, 04 Mar 2008 15:19:20 +0100
On Fri, 2008-02-29 at 09:10 -0500, Jamie McCracken wrote:
On Fri, 2008-02-29 at 11:34 +0100, Carlos Garnacho wrote:
I've attached a patch in bug #519337 to keep the extractor alive between
operations. This greatly improves performance, as it avoids having to
spawn/initialize the extractor constantly for each new file. With the
patch, the extractor shuts down by itself after 30 seconds of
inactivity, any testing is appreciated.
Besides, I've been thinking a bit in this subject. Right now trackerd
waits synchronously for the metadata extractor output (and the same
happens for thumbnailing, even when such data isn't immediately
necessary), so only 1 file is processed at the same time.
Has there been any thinking/work on making that parallelizable? I'm sure
there'd be performance improvements if there was a pool of extractors
which asynchronously processed a queue of filenames.
yeah although its tricky with threads (synchronisation and deadlock
I didn't plan to use threads here, I've developed a small test extractor
 that spawns several extractors and manages them asynchronously
through watches, it requires the patched tracker-extractor from bug
#519337. You can run it with:
./test-extract [num-extractors] [path-to-extract]
Being a test, it just gets metadata from mp3 files, but the
tracker-extractor-pool.[ch] files can be easily adapted to tracker
anyway to cut a long story short, daemonizing tracker-extract is not
way to go but rather to embed common and reliable (Eg not crash prone)
formats in a tracker-file-indexer daemon. It should use dbus of course
for flexibility. It could be threaded as it would be less complex than
trackerd is at the moment
What would be the criteria for marking a extractor as reliable? I'd be
extra-careful there, extractors deal with unknown data. Also, threading
brings other complexities, like the underlying libraries not being
thread-safe, having extractors that resort to command line calls not
thread aware at all, etc...
It's nice to know about your plans, they sound really great overall.
] [Thread Prev