Re: [Tracker] [PATCH] Moving towards internal metadata extractors



Alrighty .. here's a new patch (against cvs, not the previous patch)
for pdf using poppler-glib and Fabien's ogg/vorbis extractor.

On 9/25/06, Jamie McCracken <jamiemcc blueyonder co uk> wrote:
Edward Duffy wrote:
> Here's a patch for src/trackerd/tracker-metadata.c that can use
> internal metadata extractors, depending on mimetype.  If no internal
> extractor is provided, it falls back to the old method using
> tracker-extract.  I've included extractors for Oasis (Open Office
> files), postscript, and AbiWord.

cool - thanks for these. Will try and review your patch tonight.

Btw libgsf has metadata extraction for open Docuemnt stuff but never
mind as your stuff looks good enough (although we will want libgsf for
ms office stuff)

I'm looking at the PDF parser in
> libextractor's repos, and will hopefully have a patch for that soon.

The libextractor's one is crap by the way and does not work properly if
you have evince/poppler installed (at least on ubuntu as it replaces
some of the pdf libs when evince is installed)

libpoppler has c glib bindings so I think thats the way to go.

(when you donwload poppler, there a test-poppler-glib.c example that
gets the metadata - should be a fairly simple cut and paste job)

The libextractor ones that might be useful are for things like gif and
tiff. For png, libpng would be better and libexif for jpeg (as these
allow metadata to be written).

Bear in mind, keeping memory low and avoiding leaks is paramount in
tracker so potentially large files need to be either mmap'ed (like
libextractor does) or extracted externally.


Attached is ogg extractors which I modified from source that Fabien did
(I have not tested these nor integrated them in yet). Feel free to test
and add them. We need equivalent for mp3 and flac.

Thanks for your work in this area - its saving me lots of time :)

--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/



Attachment: tracker-internal-metadata-extractors.patch
Description: Text Data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]