Re: Will Beagle index PDFs?

From: "Srikant Jakilinki" <sriks dcs gla ac uk>
To: dashboard-hackers gnome org
Cc:
Subject: Re: Will Beagle index PDFs?
Date: Tue, 20 Jul 2004 19:19:12 +0100

It is quite easy to index PDF's. Just use the "pdftotext" as a filter andso, yes, in the very near future it will be done Ralph. In fact, if youcan spare a few moments, you can do it yourself on your *Nix box now. Justdump all your PDF's into a directory. Have a for loop running thepdftotext command redirecting your text into corresponding text files anddo a simple grep on this directory. I use it this way and let me know ifyou want the script. I use things in conjunction with "libextractor" whichcan extract metadata from PDF's as well...

If you are looking at an opensource indexing thing to index PDF's,OpenOffice and all, Google for "Docsearcher" and you should seesomething...

The important thing is what we need to do with the indexed files. Simplekeyword matching is not too intelligent I guess... more later

On Tue, 20 Jul 2004 19:35:53 +0200, Ralph Aichinger <ralph mail pangea at>wrote:

Hello!

I just love Beagle, as unfinished as ist is, especially searching
OpenOffice documents.

One thing would make it �ool though: Searching PDFs.
This would make all the stuff you can put on a scanner and OCR
within the reach of Beagle.

On the Lucene (Java) page it says that filters like xpdf can
be used to index PDFs. Can this be done with Beagle too? Would
it be hard? Will it be implemented out of the box one day?

TIA
/ralph

_______________________________________________
Dashboard-hackers mailing list
Dashboard-hackers gnome org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers




--
Regards,
Srikant (http://www.dcs.gla.ac.uk/~sriks)

Follow-Ups:
- Re: Will Beagle index PDFs?
  - From: Christopher Orr

References:
- Will Beagle index PDFs?
  - From: Ralph Aichinger

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]