Re: pdf/word filters for beagle

From: Veerapuram Varadhan <vvaradhan novell com>
To: christopher taylor <christopher paul taylor gmail com>
Cc: dashboard-hackers gnome org
Subject: Re: pdf/word filters for beagle
Date: Wed, 06 Oct 2004 11:23:07 -0700

On Tue, 2004-10-05 at 13:37 -0400, christopher taylor wrote:
> I saw your name in Nat's roadmap and was wondering if I could help you
> out with some beagle filters - I've recently started writing a pdf and
> word document filter for the project and when I noticed someone was
> formally designated for the task - I figured hey, why not just ask to
> help out with the effort. ;)
> 
WOW!! thats really cool.  However, word/MS Office document filters are
already in development.  Myself and clahey are working towards it.

PDF will be a good bet.  Currently, it uses "pdftotext" command to
extract text out of PDF.  That was a "quick" hack to get it working,
however, we really need a good PDF parser[1] that can extract text along
with the attributes say bold, italic, underline etc.

V. Varadhan.

[1] - We give preference to using well maintained/active in development
stable libraries than writing our own PDF parser.

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]