Re: [Tracker] Writing custom extractors docs?



On 05/02/10 04:32, Spivak, Max wrote:
Hi there,

Hi,

I've been looking at tracker and I like it a lot. Great job.

Thank you!

I'm looking to write custom extractors for the 0.7.x version. I'm
wondering what docs exist.

OK.

I found
http://library.gnome.org/devel/libtracker-extract/unstable/libtracker-extract-tracker-extract.html
and http://library.gnome.org/devel/libtracker-common/unstable/ -- is
this valid and current? Any other docs?

The best place to start is:

  http://live.gnome.org/Tracker/Documentation/

On that page, as you have already found, there are links to libtracker-extract and libtracker-common.

  http://library.gnome.org/devel/libtracker-extract/unstable/
  http://library.gnome.org/devel/libtracker-common/unstable/

How is the custom extractor registered with tracker-extract? Is it
just that the libextract-<abc>.so is present in the
lib/tracker-extract/tracker-0.7/extract-modules directory or is
something else necessary?

I think having this documentation is perhaps not enough. We should add to this so make this step clearer.

Essentially, yes. All that has to happen is your .so has to be in the directory:

  $prefix/lib/tracker-0.7/extract-modules/

There are some other checks made when the library is loaded (like you have the right functions in your library). Specifically:

  tracker_extract_get_data().

Is there a registry that maps a document's file extension to its
mimetype? Say I have a<filename>.abc -- what maps it to
libextract-abc.so. This is especially interesting if I have custom
documents for which I invented an extension and a mime type.

This is a good question. Ultimately you have to have a mime type for that file and that mime type is what you put in your extractor as documented in the example for libtracker-extract.

If your mime type is not registered, you need to do some magic with shared-mime-info to fix that. See:

  $prefix/share/doc/shared-mime-info/shared-mime-info-spec.pdf

I can't remember exactly the details right now, but it isn't too difficult from what I remember. If you need more help with this, let me know.

I've run across some posts that tracker will/may use
LibStreamAnalysers from Strigi. Should I use LSA for my extractors or
not?

Not right now. It doesn't push the data into tracker-store correctly (mostly because it needs updating after some recent changes) and also it is exclusively available, that meaning, we don't extract with both our inhouse/3rd party extractors AND LSA, but one or the other. Some work is needed here to allow them to be used together but also to fix the LSA extractor.

The single biggest problem, assuming everything else works for the LSA extractor, is that our ontology and the ontology LSA uses do not exactly match and this causes quite some warnings in tracker-store's logs.

--
Regards,
Martyn



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]