Re: Will Beagle index PDFs?
- From: "Srikant Jakilinki" <sriks dcs gla ac uk>
- To: dashboard-hackers gnome org
- Cc:
- Subject: Re: Will Beagle index PDFs?
- Date: Tue, 20 Jul 2004 19:19:12 +0100
It is quite easy to index PDF's. Just use the "pdftotext" as a filter and
so, yes, in the very near future it will be done Ralph. In fact, if you
can spare a few moments, you can do it yourself on your *Nix box now. Just
dump all your PDF's into a directory. Have a for loop running the
pdftotext command redirecting your text into corresponding text files and
do a simple grep on this directory. I use it this way and let me know if
you want the script. I use things in conjunction with "libextractor" which
can extract metadata from PDF's as well...
If you are looking at an opensource indexing thing to index PDF's,
OpenOffice and all, Google for "Docsearcher" and you should see
something...
The important thing is what we need to do with the indexed files. Simple
keyword matching is not too intelligent I guess... more later
On Tue, 20 Jul 2004 19:35:53 +0200, Ralph Aichinger <ralph mail pangea at>
wrote:
Hello!
I just love Beagle, as unfinished as ist is, especially searching
OpenOffice documents.
One thing would make it �ool though: Searching PDFs.
This would make all the stuff you can put on a scanner and OCR
within the reach of Beagle.
On the Lucene (Java) page it says that filters like xpdf can
be used to index PDFs. Can this be done with Beagle too? Would
it be hard? Will it be implemented out of the box one day?
TIA
/ralph
_______________________________________________
Dashboard-hackers mailing list
Dashboard-hackers gnome org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers
--
Regards,
Srikant (http://www.dcs.gla.ac.uk/~sriks)
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]