Re: pdf/word filters for beagle
- From: Veerapuram Varadhan <vvaradhan novell com>
- To: christopher taylor <christopher paul taylor gmail com>
- Cc: dashboard-hackers gnome org
- Subject: Re: pdf/word filters for beagle
- Date: Wed, 06 Oct 2004 11:23:07 -0700
On Tue, 2004-10-05 at 13:37 -0400, christopher taylor wrote:
> I saw your name in Nat's roadmap and was wondering if I could help you
> out with some beagle filters - I've recently started writing a pdf and
> word document filter for the project and when I noticed someone was
> formally designated for the task - I figured hey, why not just ask to
> help out with the effort. ;)
>
WOW!! thats really cool. However, word/MS Office document filters are
already in development. Myself and clahey are working towards it.
PDF will be a good bet. Currently, it uses "pdftotext" command to
extract text out of PDF. That was a "quick" hack to get it working,
however, we really need a good PDF parser[1] that can extract text along
with the attributes say bold, italic, underline etc.
V. Varadhan.
[1] - We give preference to using well maintained/active in development
stable libraries than writing our own PDF parser.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]