[orca-list] New a11y tool "ocrpdf"



Hi List,

i some work on a new little tool that make it possible to quick access "Image" PDF files (like scanns).
its in a early state for now.

it could (try to) detect layout and pageorientation (thanks to tesseract and ocrad :) ), cut the PDF file in images and OCR every image via tesseract. it works multithreaded to be more effective. It also should be able to OCR any other picture file (not just PDF)

example:
ocrpdf -f /path/to/your/file.pdf -l deu

ocrpdf -h

a small window in ocrdesktop style with the content pops up.
AUR ( a little out of date, i will update soon):
https://aur.archlinux.org/packages/ocrpdf/
GIT:
https://github.com/chrys87/ocrpdf

depencys:
GTK3
pythonmagick
python-pillow
python-tesserwrap
tesseract
tesseract-yourlanguage

My girlfrind use it really successfull for a month now. so i decide to make it public to you.

you also could add this for example to nemo via an action for example (so you could just ocr via contextmenu):

[Nemo Action]
Active=true
Name=OCR Datei %N
Comment=OCR File
Exec=ocrpdf -l eng -f %F
Quote=double
Selection=S
Extensions=pdf;jpg;tiff;png;jpeg

cheers chrys


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]