Re: [Tracker] Enhancement proposal: Automatic reindexing when adding new extractors



On Wed, 10 Feb 2010 14:54:36 +0100, Carlos Garnacho <carlos lanedo com>
wrote:
Hi!,

On mar, 2010-02-09 at 21:31 +0100, Adrien Bustany wrote:
Hi all, 

this proposal is about automatically reindexing a mime type when a new
extractor is added/updated. There's already a "Reindex" call in
tracker-miner-fs.

I think we should take several use cases into account here.

      * A new tracker-extractor module is installed
      * As mbiebl pointed out, a tracker-extractor module has been added
        some new capability.
      * The library a tracker extractor relies on gains new capabilities
        (i.e. GStreamer, poppler)

IMHO the trickiest one is the 3rd, which either requires integration
from packagers, or some way for extractors to probe the file types
supported. The second would largely depend on whether the library is
able to tells us that, which I don't think happens often, so we might be
just forced to the first option.

For the second usecase, we clearly need some way to version the
extractors, so it is known when to re-extract. The keyfile with version
info approach looks quite sane to me, we should provide some command
line tool to bump the version number for a given extractor.


Philip told me he'd like to keep tracker-extract as stupid as possible,
so the logic here would be implemented in tracker-miner-fs, at init
time.
All extractors modules provide a function to know the mime types they
can
index, but we want to avoid loading all the modules at start.
Therefore,
a solution using desktop files is favoured. The desktop files would be
installed in ${datadir}/tracker/extractors and would have the format

This makes much sense to me, since the extractor could not run at all if
tracker-miner-fs thinks everything is up-to-date. Certainly, having
complete info about the mimetypes the extractors knows about in the
miner would avoid queries about unsupported files to the extractor, as
it happens right now.


[Extractor]
Name=Foobar extractor
MimeTypes=application/foobar

What I propose is, having the mimetypes info separated from the
versioning info, I think this way we can provide reasonable support for
the 3 usecases mentioned above:

      * If tracker-extractor-foo 0.15 provides better information than
        the previous release 0.14, the version file installed by the
        package bumps its number, tracker-miner-fs notices the version
        bump and does its job.
      * if some gstreamer package is added, package scripts use some CLI
        tracker tool if available to add new mimetypes for the gstreamer
        extract module, tracker-miner-fs notices the changes in
        mimetypes supported and does its job. The version number isn't
        actually changed, since the extract module hasn't changed.
      * A new extractor is installed, tracker-miner-fs notices no prior
        info about it and does it's job, more or less like a version
        bump.

So, I guess there should be some $(datadir)/tracker/extractors/ with
version info and a $(datadir)/tracker/extractor-mimetypes/ with a
mimetype->extractor mapping. The main caveat I see here is how would the
initial mimetype mapping be done for certain modules (gstreamer yet
again in mind :), this could require yet again packagers help.
Rather than having a CLI tool to add mimetypes, I'd allow "partial
definition" of modules. Eg. we could have in two separate files

[Extractor]
Name=Gstreamer base extractor
MimeTypes=audio/ogg

and

[Extractor]
Name=Gstreamer ffmpeg extractor
MimeTypes=audio/mp3

we don't actually care of the mapping to the .so file, since it's done by
tracker-extract. We're just enumerating the supported mimetypes here.

I'm not totally sure either about separating version from MimeType... What
happens for example if my old gstreamer-ffmpeg couldn't read the vorbis
tags
(only id3) for ogg files , and the new one can ? The extractor version
doesn't change, and the mimetypes don't change either (ogg was always
supported). Therefore, I'd prefer one Version key per desktop file.

Cheers

Adrien


Besides, we also need to take into account restarting tracker-extract if
it's alive at the time of the update, and making things persistent so if
tracker-miner-fs shuts down pending file checks wouldn't be lost. This
is going to be tricky :)
Good point, maybe we could add a "RequestShutdown" call to tracker-extract
that would wait for any pending task to finish and shutdown it.


Cheers,
   Carlos


at startup, the FS miner loads all the description files and checks if
a
new extractor has been added, removed, or changed its mtime. If
so, it calls the reindex method with the appropriate mime type.
To detect a change in a desktop file, a list of each desktop file with
their
modification time is kept in cache by the FS miner.

Ideas :
- Describe several extractors in one file
  That makes is much more difficult to detect a change, since only one
extractor
  in a file listing 10 might have changed when the file modification
time
changes.
- Adding a version number in the desktop file, to avoid relying only on
the mtime
  of the desktop file.

Please tell me your thoughts

Adrien
_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list


_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]