Re: [Tracker] tracker as full text index/search tool for a large collection of pdf, ps, djvu, dvi documents?



On Mon, 2010-08-16 at 01:08 -0300, Ezequiel Birman wrote:
"MB" == Michael Biebl <mbiebl-Re5JQEeQqe8AvxtiuMwx3w public gmane org> writes:

    > 2008/10/4 Meik Hellmund
    > <Meik.Hellmund-o02PS0xoJP/FlEURE6xjZxvVK+yQ3ZXh public gmane org>:
    >> 
    >> Dear tracker developers,
    >> 
    >> I have a collection of ca 10000 documents, mostly Postscript,
    >> PDF, DjVu and DVI format and I am looking for a full text
    >> index/search tool. I tried tracker 0.6.6 from Debian/unstable and
    >> have now some questions where I didn't find the answer in the
    >> docu and faq.

Please don't try 0.6.<ANYTHING> :)

Slow, less data indexed, etc.

    >> - Tracker works fine and great with PDF documents. Full points!
    >> That's what I am looking for.  But:
    >> 
    >> - It seems that Postscript, Dvi and Djvu documents are not fully
    >> indexed, only the metadata are used. How can I change this?

For 0.6.x you can't, it is no longer supported.

    > For djvu, there is already a a filter
    > /usr/lib/tracker/filters/text/djvu_filter

All versions >= 0.7.x no longer support filters.

    > It should index the content of djvu files, but it requires the
    > djvulibre-bin package being installed. (The tracker deb package
    > has a recommends on this package).

    > Ivan already posted instruction how to create filters for ps and
    > dvi.

    > If you have created working filters for these mimetypes feel free
    > to send them to us so we can include them upstream.

    > Cheers, Michael

    > -- Why is it that all of the instruments seeking intelligent life
    > in the universe are pointed away from Earth?

I have the same problem as Meik, plus I have no
/usr/lib64/tracker/filters directory. Do fedora packagers omit those
files?

No, as said above we no longer have filters. They were largely
unsupported and a very bad approach to what we wanted to do.

I'm on Rawhide. This is 'uname -a' output:
Linux david.espiga4.com.ar 2.6.36-0.0.rc0.git1.fc15.x86_64 #1 SMP Wed
Aug 4 16:26:35 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

# rpm -q tracker
tracker-0.8.15-1.fc14.x86_64

To get your files indexed, you need to write an extractor module. To do
that, see the examples/libtracker-extract/ code form the code base (or
the existing extractors. They should be fairly painless to set up.

Then you just need to install the .so to the right directory (man
tracker-extract for the details). Also, if you want to reindex ALL you
files using your new extractor, you can do that too using:

  tracker-control --reindex-mime-type=MIME

You can provide multiple switches of --reindex-mime-type too ;)

-- 
Regards,
Martyn




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]