Re: [Tracker] Reviving the libstreamanalyzer based extractor module

From: Ivan Frade <ivan frade gmail com>
To: Philip Van Hoof <philip codeminded be>
Cc: Tracker mailing list <tracker-list gnome org>
Subject: Re: [Tracker] Reviving the libstreamanalyzer based extractor module
Date: Fri, 14 Jun 2013 10:36:20 -0700

Hi Philip,

On Fri, Jun 14, 2013 at 7:23 AM, Philip Van Hoof <philip codeminded be> wrote:

Hi team,

During a Tracker/Nepomuk/SPARQL training I gave at one of my customers I noted the interest in extractors that can dive into archives and document types that have a tree of other documents (like MIME documents).

Just today another message in this mailing list was mentioning it :)

That or libtracker-extract should allow a stream or buffer based extraction, and/or a file descriptor based one (in which case we could pass the extractor modules, the ones now only used by tracker-extract, a by pipe created FD from the E-mail client, and write the Base64 decoded data to the pipe FD - or something). Unfortunately is tracker-extract right now entirely FILE based (not really FD based, nor stream based).

FD passing and buffered extraction are both good ideas. They are also independent. We could implement any of them without the other.

I think it would be a great first addition if the tracker-extract .rule file based environment would be adapted to have two levels of matching: first on container and then on MimeType. The first level would for all of its native extractors be "Just File", and for the libstreamanalyzer's be "MIMEDocument" and "Archive". The second level would be the same as now. Ideally this level system could also be used for multimedia files (videos have first a MIME type and then a codec type, for example).

Is this two level matching really needed? at the end we recognize the containers with mime-types (e.g. application/x-tgz). With the current .rules files, we can assign those "container mime-types" to the topanalyzer.

Then would it start being possible for a extractor module like tracker-topanalyzer.cpp to get kicked into action for diving into archive files and MIME documents (and the native ones would still operate on native file types).

Also should the tracker-topanalyzer.cpp be fixed. It has been a long time that it was last tested and I don't expect it to still work. And for it to work it would probably be needed that libstreamanalyzer gets adapted to follow Tracker's Nepomuk adaptations (right now libstreamanalyzer doesn't know about the nmm ontology, afaik).

I wonder if Jos is still working on it. We could bring back to life that topanalyzer extractor, use it for compressed files and move on from there.

Best Regards,

Ivan

_______________________________________________
tracker-list mailing list
tracker-list gnome org
https://mail.gnome.org/mailman/listinfo/tracker-list

Follow-Ups:
- Re: [Tracker] Reviving the libstreamanalyzer based extractor module
  - From: Philip Van Hoof

References:
- [Tracker] Reviving the libstreamanalyzer based extractor module
  - From: Philip Van Hoof

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]