Re: [orca-list] pdftotext help.



Greetings again,

I am providing a work-around solution that I just found. Rather than
use jstor's direct download link for pdf's, I am using the print
option to save a pdf to my hard drive. I suppose then that this makes
it a scan, because pdftotext could do nothing with it. Then I
installed and ran ocrmypdf on it, and practically all of the missing
content now shows!

hth for anyone else out there with a similar problem,

Hwaen Ch'uqi


On 11/25/21, Hwaen Ch'uqi <hwaenchuqi gmail com> wrote:
Greetings All,

Following up on my earlier question about searching pdf files in
firefox and other browsers, I am now using pdftotext. The results are
generally fine, but I have noticed that pdf's from a certain site that
I use quite often - namely, jstor - seem consistently to be missing
characters. It almost seems as if pdftotext assumes certain margins
that are narrower than the documents', as if a4 horizontal margins are
being assumed rather than letter size. This is just a guess. Has
anyone run across this kind of thing and come up with a solution? I
tried playing with the -x and -y flags, setting them to 0, but the
results are the same.

I realize that this isn't precisely an orca question, but I thank you
for any help.

Hwaen Ch'uqi



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]