Re: [Tracker] tracker as full text index/search tool for a large collection of pdf, ps, djvu, dvi documents?



"MR" == Martyn Russell <martyn-bhGbAngMcJvQT0dZR+AlfA public gmane org> writes:

    > On Mon, 2010-08-16 at 01:08 -0300, Ezequiel Birman wrote:
    >> >>>>> "MB" == Michael Biebl
    >> <mbiebl-Re5JQEeQqe8AvxtiuMwx3w-XMD5yJDbdMReXY1tMh2IBg public gmane org>
    >> writes:
    >> 
    >> > 2008/10/4 Meik Hellmund >
    >> <Meik.Hellmund-o02PS0xoJP/FlEURE6xjZxvVK+yQ3ZXh-XMD5yJDbdMReXY1tMh2IBg public gmane org>:
    >> >> 
    >> >> Dear tracker developers,
    >> >> 
    >> >> I have a collection of ca 10000 documents, mostly Postscript,
    >> >> PDF, DjVu and DVI format and I am looking for a full text >>
    >> index/search tool. I tried tracker 0.6.6 from Debian/unstable and
    >> >> have now some questions where I didn't find the answer in the
    >> >> docu and faq.

    > Please don't try 0.6.<ANYTHING> :)

    > Slow, less data indexed, etc.

Sorry. I was quoting and old message. My fault.

    >> >> - Tracker works fine and great with PDF documents. Full
    >> points!  >> That's what I am looking for.  But:
    >> >> 
    >> >> - It seems that Postscript, Dvi and Djvu documents are not
    >> fully >> indexed, only the metadata are used. How can I change
    >> this?

    > For 0.6.x you can't, it is no longer supported.

    >> > For djvu, there is already a a filter >
    >> /usr/lib/tracker/filters/text/djvu_filter

    > All versions >= 0.7.x no longer support filters.

    >> > It should index the content of djvu files, but it requires the
    >> > djvulibre-bin package being installed. (The tracker deb package
    >> > has a recommends on this package).
    >> 
    >> > Ivan already posted instruction how to create filters for ps
    >> and > dvi.
    >> 
    >> > If you have created working filters for these mimetypes feel
    >> free > to send them to us so we can include them upstream.
    >> 
    >> > Cheers, Michael
    >> 
    >> > -- Why is it that all of the instruments seeking intelligent
    >> life > in the universe are pointed away from Earth?
    >> 
    >> I have the same problem as Meik, plus I have no
    >> /usr/lib64/tracker/filters directory. Do fedora packagers omit
    >> those files?

    > No, as said above we no longer have filters. They were largely
    > unsupported and a very bad approach to what we wanted to do.

    >> I'm on Rawhide. This is 'uname -a' output: Linux
    >> david.espiga4.com.ar 2.6.36-0.0.rc0.git1.fc15.x86_64 #1 SMP Wed
    >> Aug 4 16:26:35 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
    >> 
    >> # rpm -q tracker tracker-0.8.15-1.fc14.x86_64

    > To get your files indexed, you need to write an extractor
    > module. To do that, see the examples/libtracker-extract/ code form
    > the code base (or the existing extractors. They should be fairly
    > painless to set up.

    > Then you just need to install the .so to the right directory (man
    > tracker-extract for the details). Also, if you want to reindex ALL
    > you files using your new extractor, you can do that too using:

    >   tracker-control --reindex-mime-type=MIME

    > You can provide multiple switches of --reindex-mime-type too ;)

    > -- Regards, Martyn

Thank you Martyn. I'll check the djvulibre libraries. I Hope to be able
to contribute some code soon.

Is there any chance to change the gmane backend to allow posting?

-- 
Ezequiel Birman



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]