Re: Will Beagle index PDFs?

It is quite easy to index PDF's. Just use the "pdftotext" as a filter and so, yes, in the very near future it will be done Ralph. In fact, if you can spare a few moments, you can do it yourself on your *Nix box now. Just dump all your PDF's into a directory. Have a for loop running the pdftotext command redirecting your text into corresponding text files and do a simple grep on this directory. I use it this way and let me know if you want the script. I use things in conjunction with "libextractor" which can extract metadata from PDF's as well...

If you are looking at an opensource indexing thing to index PDF's, OpenOffice and all, Google for "Docsearcher" and you should see something...

The important thing is what we need to do with the indexed files. Simple keyword matching is not too intelligent I guess... more later

On Tue, 20 Jul 2004 19:35:53 +0200, Ralph Aichinger <ralph mail pangea at> wrote:


I just love Beagle, as unfinished as ist is, especially searching
OpenOffice documents.

One thing would make it �ool though: Searching PDFs.
This would make all the stuff you can put on a scanner and OCR
within the reach of Beagle.

On the Lucene (Java) page it says that filters like xpdf can
be used to index PDFs. Can this be done with Beagle too? Would
it be hard? Will it be implemented out of the box one day?


Dashboard-hackers mailing list
Dashboard-hackers gnome org

Srikant (

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]