[Gnome-OCR] Integration of Tesseract-OCR...



Hi all,

As I'm pretty new here just forgive me if I'm not at the right place. :)

I am associate professor at Bordeaux-I university (France) and I have
submitted a project for students about integrating tesseract-OCR in
Gnome (the student project starts only in January).

My plan is more or less to:
- make them develop a libgnome-ocr as wrapper to tesseract-OCR,
- clean the code of tesseract-OCR,
- refactor tesseract-OCR within the Gnome libs and,
- try to add some extra features.

(I think the students will stop at the first item but we never know !)

I don't know exactly what should be the API and what could be the usage
of such library but with the help of Étienne Bersac (the author of
libgnome-scan) I though about few examples:
- A plug-in for Abiword (outputting also formatting informations);
- A plug-in for e-mail readers (image spam analysis);
- ... and so forth ...

I guess, that the API should include an initialization function, setting
the image input format, the output format plus some settings
(recognition strategy, drawing recognition, where to store the output, etc).

For now, waiting for the start of the project in January I'm trying to
port Tesseract-OCR to 64bits plate-forms... and I'm a bit horrified by
the way they handle basic types and data-structures... My guess is that
a lot of cleaning is needed there. :-/

Anyway, is this project interesting for the Gnome community, would you
have comments, advices or objections ?

Regards
-- 
Emmanuel Fleury              | Office: 261
Associate Professor,         | Phone: +33 (0)5 40 00 69 34
LaBRI, Domaine Universitaire | Fax:   +33 (0)5 40 00 66 69
351, Cours de la Libération  | email: emmanuel fleury labri fr
33405 Talence Cedex, France  | URL: http://www.labri.fr/~fleury



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]