Re: [orca-list] pdftotext help.



Hey there,

did you try mupdf-tools?


sudo apt install mupdf-tools

mutool draw -o output.txt input.pdf


I'm using it as the default option instead of pdftotext, though I can't
honestly recall why.

Sometimes it treats every line of the text as a paragraph and places
empty lines accordingly, I guess this is the consequence of how some
PDFs are made.

I have a Python script for text processing, which I usually let handle
these things, so it's not a big issue.


Best regards


Rastislav


Dňa 25. 11. 2021 o 23:14 Hwaen Ch'uqi via orca-list napísal(a):
Greetings again,

I am providing a work-around solution that I just found. Rather than
use jstor's direct download link for pdf's, I am using the print
option to save a pdf to my hard drive. I suppose then that this makes
it a scan, because pdftotext could do nothing with it. Then I
installed and ran ocrmypdf on it, and practically all of the missing
content now shows!

hth for anyone else out there with a similar problem,

Hwaen Ch'uqi


On 11/25/21, Hwaen Ch'uqi <hwaenchuqi gmail com> wrote:
Greetings All,

Following up on my earlier question about searching pdf files in
firefox and other browsers, I am now using pdftotext. The results are
generally fine, but I have noticed that pdf's from a certain site that
I use quite often - namely, jstor - seem consistently to be missing
characters. It almost seems as if pdftotext assumes certain margins
that are narrower than the documents', as if a4 horizontal margins are
being assumed rather than letter size. This is just a guess. Has
anyone run across this kind of thing and come up with a solution? I
tried playing with the -x and -y flags, setting them to 0, but the
results are the same.

I realize that this isn't precisely an orca question, but I thank you
for any help.

Hwaen Ch'uqi

_______________________________________________
orca-list mailing list
orca-list gnome org
https://mail.gnome.org/mailman/listinfo/orca-list
Orca wiki: https://wiki.gnome.org/Projects/Orca
Orca documentation: https://help.gnome.org/users/orca/stable/
GNOME Universal Access guide: https://help.gnome.org/users/gnome-help/stable/a11y.html



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]