Re: [Tracker] Enhancement proposal: Automatic reindexing when adding new extractors



Hi!,

On mar, 2010-02-09 at 21:31 +0100, Adrien Bustany wrote:
Hi all, 

this proposal is about automatically reindexing a mime type when a new
extractor is added/updated. There's already a "Reindex" call in
tracker-miner-fs.

I think we should take several use cases into account here.

      * A new tracker-extractor module is installed
      * As mbiebl pointed out, a tracker-extractor module has been added
        some new capability.
      * The library a tracker extractor relies on gains new capabilities
        (i.e. GStreamer, poppler)

IMHO the trickiest one is the 3rd, which either requires integration
from packagers, or some way for extractors to probe the file types
supported. The second would largely depend on whether the library is
able to tells us that, which I don't think happens often, so we might be
just forced to the first option.

For the second usecase, we clearly need some way to version the
extractors, so it is known when to re-extract. The keyfile with version
info approach looks quite sane to me, we should provide some command
line tool to bump the version number for a given extractor.


Philip told me he'd like to keep tracker-extract as stupid as possible,
so the logic here would be implemented in tracker-miner-fs, at init time.
All extractors modules provide a function to know the mime types they can
index, but we want to avoid loading all the modules at start. Therefore,
a solution using desktop files is favoured. The desktop files would be
installed in ${datadir}/tracker/extractors and would have the format

This makes much sense to me, since the extractor could not run at all if
tracker-miner-fs thinks everything is up-to-date. Certainly, having
complete info about the mimetypes the extractors knows about in the
miner would avoid queries about unsupported files to the extractor, as
it happens right now.


[Extractor]
Name=Foobar extractor
MimeTypes=application/foobar

What I propose is, having the mimetypes info separated from the
versioning info, I think this way we can provide reasonable support for
the 3 usecases mentioned above:

      * If tracker-extractor-foo 0.15 provides better information than
        the previous release 0.14, the version file installed by the
        package bumps its number, tracker-miner-fs notices the version
        bump and does its job.
      * if some gstreamer package is added, package scripts use some CLI
        tracker tool if available to add new mimetypes for the gstreamer
        extract module, tracker-miner-fs notices the changes in
        mimetypes supported and does its job. The version number isn't
        actually changed, since the extract module hasn't changed.
      * A new extractor is installed, tracker-miner-fs notices no prior
        info about it and does it's job, more or less like a version
        bump.

So, I guess there should be some $(datadir)/tracker/extractors/ with
version info and a $(datadir)/tracker/extractor-mimetypes/ with a
mimetype->extractor mapping. The main caveat I see here is how would the
initial mimetype mapping be done for certain modules (gstreamer yet
again in mind :), this could require yet again packagers help.

Besides, we also need to take into account restarting tracker-extract if
it's alive at the time of the update, and making things persistent so if
tracker-miner-fs shuts down pending file checks wouldn't be lost. This
is going to be tricky :)

Cheers,
   Carlos


at startup, the FS miner loads all the description files and checks if a
new extractor has been added, removed, or changed its mtime. If
so, it calls the reindex method with the appropriate mime type.
To detect a change in a desktop file, a list of each desktop file with
their
modification time is kept in cache by the FS miner.

Ideas :
- Describe several extractors in one file
  That makes is much more difficult to detect a change, since only one
extractor
  in a file listing 10 might have changed when the file modification time
changes.
- Adding a version number in the desktop file, to avoid relying only on
the mtime
  of the desktop file.

Please tell me your thoughts

Adrien
_______________________________________________
tracker-list mailing list
tracker-list gnome org
http://mail.gnome.org/mailman/listinfo/tracker-list





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]