On Tue, Apr 25, 2006 at 02:12:29PM -0400, D Bera wrote: > > On Mon, 2006-04-24 at 21:11 +0200, Jan Falkenhagen wrote: > > > i've got a problem with beagle indexing pdf documents. Sometimes pdf > > > indexing causes very high cpu and memory consumption of the pdftotext > > > process (>1.5 GB of RAM). however i guess that this occurs only with > > > some broken pdf files. is there any simple way to track down this > > > behaviour to the files that are causing this problem? > > > > This is a bug in the xpdf software (from which pdftotext comes), and I'd > > suggest you report a bug to their developers. > > > > If you'd like, you can attach a broken PDF to a beagle bug and we can > > see what we can do to work around it. > > Or if someone wants to write a managed PDF parser (in C#), that would > be cool too :). There are some C# pdf libraries out there that can be > used e.g. itextsharp (its under LGPL, is that compatible with beagle > licensing ?) Or poppler-sharp. -- Tomasz Torcz RIP is irrevelant. Spoofing is futile. zdzichu irc -nie spam- pl Your routes will be aggreggated. -- Alex Yuriev
Attachment:
pgpo59PcXmoPU.pgp
Description: PGP signature