[evince] Copy/pasted lines out of order



I  often prefer text files to pdfs - they are more compatible
with my text file based systems for keeping information on my
computer. I can reformat text files, add notes, and more
flexibly search, extract selected info. and write macros to use
the information.  But PDFs have there utility and of course
information is often received in pdfs.

Recently I got the membershop list from our Unitarian Society
in a pdf and made a text file with the information.  I used
select all , copy/paste from Evince 3.4.0 on my Ubuntu 12.04
with Unity 2D system into a text file.

The PDF file has multi line member household entries.
Entries are in two columns and entries columns are alphabetical
and the next entry after the bottom left on a page is in the
upper right on that page.

The text file had entries in one column in one continuous
alphabetical order,  each entry line made one line of text
(even tho the pdf lines could have lines for each column)
BUT some lines were out of order - maybe 10% of the lines.
There was no clear consistant pattern to what lines were out
of order or how, that I could see. Here is an attempt at a much
simplified illustration of the transformation.

PDF
word11  word12   word13             word41  word42   word43
word21  word22   word23             word51  word52   word53
word31  word32   word33             word61  word62   word63

TXT
word11  word12   word13
word21  word22   word23
word41  word42   word43
word51  word52   word53
word61  word62   word63
word31  word32   word33

(Note that the last lineof the text version is out of order)

I tried copy paste from Adobe PDF reader on MSWin XP to text and
it produced a similar text file but no lines were out of order.

Other times copy/paste from pdfs get other permuations of the
order of words.  I think I once saw a permutation something like:

PDF:
word11  word12   word13
word21  word22   word23
word31  word32   word33

TXT:
word11
word21
word31
word12
word22
word32
word13
word23
word33

Is it a goal of Evince to have copy pasted text have the same
arrangement in the text file as in the PDF?

Note that similar transformation from Firefox displayed web pages
with tables to text happens.  Is it something about how copy/paste
is implemented?  If so, to whom should this question be addressed?
I do not know where copy/paste is implemented.

Fred  (who does not subscribe to this list, cc of replies appreciated)

--
Fred H. Olson  Minneapolis,MN 55411  USA        (near north Mpls)
     Email:        fholson at cohousing.org      612-588-9532
My Link Pg: http://fholson.cohousing.org         My org:
Communications for Justice -- Free, superior listserv's w/o ads



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]