Re: [Tracker] Fwd: Possible GSoC project idea

From: Adrien Bustany <abustany gnome org>
To: Martyn Russell <martyn lanedo com>
Cc: Tracker list <tracker-list gnome org>
Subject: Re: [Tracker] Fwd: Possible GSoC project idea
Date: Mon, 22 Mar 2010 18:08:12 +0100

On Mon, 22 Mar 2010 16:13:26 +0000, Martyn Russell <martyn lanedo com>
wrote:

On 22/03/10 11:52, Adrien Bustany wrote:

On Mon, 22 Mar 2010 09:40:33 +0000, Martyn Russell<martyn lanedo com>

Not off the top of my head.


I think Mukund is thinking about OCR'ing all the images in PDF

documents,

and add that to the "full text" part of tracker indexing. So that would
be
part of the PDF extractor.


Extracting text for FTS indexing for PDFs is already done.


But what if I scan a document, without OCR ? I get a PDF with a bitmap
image
inside, and the text is not FTS'ed. Or is it currently ? I not sure the
OCR
part should be a tracker extractor, I'd rather see it as an external
module
(for obvious performance reasons). But still, it'd be an interesting
project.
Combined with writeback, you could writeback the OCR'd text to the PDF.
Well,
writing this I realize that this should actually be done by evince, which
would write the text back to the PDF, and Tracker would index it. So yeah,
nothing to do on Tracker side :)

Follow-Ups:
- Re: [Tracker] Fwd: Possible GSoC project idea
  - From: Martyn Russell

References:
- Re: [Tracker] Fwd: Possible GSoC project idea
  - From: Adrien Bustany
- Re: [Tracker] Fwd: Possible GSoC project idea
  - From: Martyn Russell

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]