Re: Subtitle extraction via OCR




On Tue, 2007-07-31 at 01:13 +0200, Mathias Brodala wrote:
Hi Liam.

(You donât need to CC me, Iâm subscribed.)
evolution really needs a "swap To and Cc fields" button.

I've yet to use any open source OCR package that has been less effort than
rekeying -- commercial OCR software is workable though.

Hm, is libgocr that bad? (As an example.)
Yes.

There's one from Google that may be slightly better,
tesseract, as it's based on what was originally proprietary
code written in the 1980s.
http://code.google.com/p/tesseract-ocr/

I don't know if the abby finereader API is available for Linux;

Seems like[2]:

ABBYY FineReader SDKs [â]  provide developers with an Application Programming
Interface (API) for integrating the functionality of ABBYY FineReader into
applications built for Windows or Linux platforms.

yes.

[2] http://www.abbyy.com/for_developers/

If you have some samples I can run them through Abby FineReader and also
gocr (and maybe tesseract) if it is of use to you.

If you have only one font, you might be able to do well with some
pre-processing, and by training the software.  Watch out for
italic or bold emphasised words though, which count as a different
font.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]