Re: [Tracker] Fwd: Possible GSoC project idea

On Mon, 22 Mar 2010 16:13:26 +0000, Martyn Russell <martyn lanedo com>
On 22/03/10 11:52, Adrien Bustany wrote:
On Mon, 22 Mar 2010 09:40:33 +0000, Martyn Russell<martyn lanedo com>
Not off the top of my head.

I think Mukund is thinking about OCR'ing all the images in PDF
and add that to the "full text" part of tracker indexing. So that would
part of the PDF extractor.

Extracting text for FTS indexing for PDFs is already done.

But what if I scan a document, without OCR ? I get a PDF with a bitmap
inside, and the text is not FTS'ed. Or is it currently ? I not sure the
part should be a tracker extractor, I'd rather see it as an external
(for obvious performance reasons). But still, it'd be an interesting
Combined with writeback, you could writeback the OCR'd text to the PDF.
writing this I realize that this should actually be done by evince, which
would write the text back to the PDF, and Tracker would index it. So yeah,
nothing to do on Tracker side :)

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]