beagle-extract-content question: PDF docs

From: Stephan Hegel <stephan hegel gmx de>
To: dashboard-hackers gnome org
Subject: beagle-extract-content question: PDF docs
Date: Sun, 11 Nov 2007 05:36:39 +0100

Hi all,

I've got a few PDF docs where beagle cannot find any contents
in it:

> beagle-extract-content gebackene.zucchini.pdf
Debug: Loaded 52 filters from /usr/lib/beagle/Filters/Filters.dll
Filter: Beagle.Filters.FilterPdf (determined in .12s)
MimeType: application/pdf

Properties:
  Timestamp = 2007-11-11 04:21:32 (Utc)
  dc:appname = ESP Ghostscript 8.15
  fixme:page-count = 1

Content:
(no content)
HotContent:
(no hot content)

Text extracted in .02s


The files were created by printing a web page to a postscript file
and convert it with the ps2pdf utility to a pdf file. Xpdf and the
Acrobat reader can display these PDFs fine.

And no, it is not just an embedded image as the file size is just
40k for one A4 page (with one image in it) and I'm able to select
parts of the text in the Acrobat Reader.

A known issue / limitation ?

Beagle is 0.2.18 from openSUSE 10.3.

Kind regards,
  Stephan.

Follow-Ups:
- Re: beagle-extract-content question: PDF docs
  - From: Debajyoti Bera

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]