Re: [Tracker] Fwd: Possible GSoC project idea

On 22/03/10 17:08, Adrien Bustany wrote:
On Mon, 22 Mar 2010 16:13:26 +0000, Martyn Russell<martyn lanedo com>
Extracting text for FTS indexing for PDFs is already done.

But what if I scan a document, without OCR ? I get a PDF with a
bitmap image inside, and the text is not FTS'ed. Or is it currently ?

Ah I see your point. No that is not supported.

I not sure the OCR part should be a tracker extractor, I'd rather see
it as an external module (for obvious performance reasons).

I would have to agree. PDFs generally can have quite a number of images in them and I think this would not be too useful in the majority of cases.

still, it'd be an interesting project. Combined with writeback, you
could writeback the OCR'd text to the PDF.

That's quite a good idea.

Well, writing this I
realize that this should actually be done by evince, which would
write the text back to the PDF, and Tracker would index it. So yeah,
nothing to do on Tracker side :)

From your OCR app, you could just update the FTS for the PDF yourselves I suppose. :)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]