Re: [orca-list] Anyone able to OCR a PDF file?



Janina Sajka <janina rednote net> wrote:
 
I know people do this on other OS's. Has anyone suggestions on how to do
this in Linux?

I haven't tried it, but here's an outline of a procedure that could work (with
suitable modifications).

Step 1: use pdfimage to extract the images from the PDF file.

Step 2: If necessary, use convert (part of the imagemagick package) to convert
the images into a suitable format.

Step 3: Run your favourite OCR tool.

Step 4: Write a shell script to automate the above.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]