[Evolution] Importing from Outlook (using outport) - summary



OK, I now have all my old emails imported into Evolution under Linux
from Outlook pst files... for the benefit of the archives, process as
follows:

1) Export from Outlook under Windows, yielding seperate files per
message.

2) Under Linux, for each message:

    Convert to Unix format (I used the dos2unix util, 
      but sed would do the job happily.)

    Process message header to strip excess whitespace between field
names and contents - evo (at least 1.2.2) really doesn't like tabs in
this position, and outport seems to use them for some messages but not
all.
  
    Process message header to convert long-format month names in the
'sent' field to the short format (i.e. November --> Nov) otherwise evo
fails to recognise the months other than May and defaults to January for
everything :)

    Process message header to remove square brackets around email sender
address and replace with angle brackets (I really don't know why outport
insists on using square brackets - I don't think that's legal and not
evo's fault that it refuses to parse the sender without it being fixed)

    Process mail header to change 'Sent:' field to 'Date:', as evo seems
to only act on the Date field.

    Finally pipe message through the 'formail' utility to correctly
format any included headers in the message text and to generate the
correct initial 'From' line before the message header fields.

3) Concatenate relevant messages into a plain-text mbox file.

4) Create empty placeholder folders within evo for the various mbox
files I built up, quit evo, and overwrite the new zero-byte mbox files
which evo had created within its disk structure with my own populated
mbox files. When I next start evo it refreshes the relevant data
structures to reflect the new mbox contents automatically.


Notes:

I could have greatly improved things by downloading the source to
outport and hacking that, only I didn't have a Windows C compiler handy
:-)

Some of the old mail I have came from an old MS Exchange server, and the
headers in messages exported using outport don't contain proper email
addresses, but just peoples' names. The lack of user host domain address
doesn't seem to bother evo, except that it seems to stick an extra space
at the start of such names when displaying the 'from' column in the GUI.

I didn't write anything to cope with re-encoding attachments and adding
them to the mbox file because I didn't need to. If anyone needs to do
that, you're on your own :-)

Piping message content through:
  
  sed -e 's/^From:[\t ]*\(.*\) \[\(.*\)\]$/From: \1 \<\2\>/'

... seems to happily cope with translating square brackets around email
addresses to angle brackets.

Similarly, piping the message through:

   sed -e 's/^Sent:[\t ]*\([0-9][0-9]\) January/Sent: \1 Jan/' 

... handles the month translation for Sent: fields (repeat for other
months, although there's probably a nifty way of doing it all in one sed
command)

The relevant email header rfc is a bit vague about what stuff needs to
be a single space, what can be multiple spaces, and what can be
whitespace (i.e. tabs too), so I'm not sure if some of the problems I
had with stray tabs between header field names and data was actually a
bug in outport, or a bug in evo (in that it didn't cope with correctly
parsing them).

Phew. Anyway, I now have several thousand messages spanning several
years all in one conveniant place (and better still, in a portable, open
and searchable format) so I'm happy...

cheers

Jules



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]