Re: beagle problem with indexing pdf files
- From: "Kevin Kubasik" <kevin kubasik net>
- To: "D Bera" <dbera web gmail com>
- Cc: dashboard-hackers gnome org
- Subject: Re: beagle problem with indexing pdf files
- Date: Tue, 25 Apr 2006 20:37:12 -0400
itextsharp cannot do what we need. Some extensive searching has turned
up PdfSharp, a lib which shows promise. It can do what we need, but
has not undergone and Linux/Mono testing. While the project imports
into Monodevelop without issue, building seems to use some Windows
build system specific pre-processing stuff. If anyone wants to give it
a look and can get a makefile working, then actual implementation of
text extraction is pretty easy.
Cheers,
Kevin Kubasik
On 4/25/06, D Bera <dbera web gmail com> wrote:
> > On Mon, 2006-04-24 at 21:11 +0200, Jan Falkenhagen wrote:
> > > i've got a problem with beagle indexing pdf documents. Sometimes pdf
> > > indexing causes very high cpu and memory consumption of the pdftotext
> > > process (>1.5 GB of RAM). however i guess that this occurs only with
> > > some broken pdf files. is there any simple way to track down this
> > > behaviour to the files that are causing this problem?
> >
> > This is a bug in the xpdf software (from which pdftotext comes), and I'd
> > suggest you report a bug to their developers.
> >
> > If you'd like, you can attach a broken PDF to a beagle bug and we can
> > see what we can do to work around it.
>
> Or if someone wants to write a managed PDF parser (in C#), that would
> be cool too :). There are some C# pdf libraries out there that can be
> used e.g. itextsharp (its under LGPL, is that compatible with beagle
> licensing ?)
> http://itextsharp.sourceforge.net/
>
> --
> -----------------------------------------------------
> Debajyoti Bera @ http://dbera.blogspot.com
> beagle / KDE fan
> Mandriva / Inspiron-1100 user
> _______________________________________________
> Dashboard-hackers mailing list
> Dashboard-hackers gnome org
> http://mail.gnome.org/mailman/listinfo/dashboard-hackers
>
--
Cheers,
Kevin Kubasik
http://blog.kubasik.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]