Re: pdf/word filters for beagle



On Tue, 2004-10-05 at 13:37 -0400, christopher taylor wrote:
> I saw your name in Nat's roadmap and was wondering if I could help you
> out with some beagle filters - I've recently started writing a pdf and
> word document filter for the project and when I noticed someone was
> formally designated for the task - I figured hey, why not just ask to
> help out with the effort. ;)
> 
WOW!! thats really cool.  However, word/MS Office document filters are
already in development.  Myself and clahey are working towards it.

PDF will be a good bet.  Currently, it uses "pdftotext" command to
extract text out of PDF.  That was a "quick" hack to get it working,
however, we really need a good PDF parser[1] that can extract text along
with the attributes say bold, italic, underline etc.


V. Varadhan.

[1] - We give preference to using well maintained/active in development
stable libraries than writing our own PDF parser.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]