Re: [Evolution] remove duplicate emails



On Thu, 2005-12-01 at 12:40 +0100, Martin Klaffenboeck wrote:
Hi there!

Duplicate Emails are a result of multiple (accidently) downloading an
email from a pop server, (inaccidently) if someone sends the mail to two
or more addresses from my own which will be collected into one folder,
or if someone presses 'apply to all' when answering to one of my mailing
list posts.

How can I get rid of this duplicate messages?

Note that there is an ambiguity in how you define duplicate messages.
Multiple downloading from pop server results in true duplicates, but the
"reply to all" generally does not (the headers will have minor
differences). 

Here is a perl script for removing duplicates (defined in yet a third
way). It requires the Mail::Utils package from cpan. Watch out for
folded lines. Find the relevant folders in .evolution, stop evolution,
run it and let evolution fix up the indexes, etc. the next time you open
it.

#!/usr/bin/perl -w
# Takes one argument, the name of a mailbox file (in standard unix mailbox
# format) It eliminates duplicate messages from this file by reading the file,
# then writing only one copy of the messages back to the file.
# It defines duplicate messages as those having identical "From\b" lines, 
# which must be the first line of the message
use Mail::Util qw(read_mbox);
use Mail::Util qw(maildomain);

$file = $ARGV[0];

$| = 1;
my %uniques;
print "Working on file $file ";
@ans = read_mbox($file);
$cnt = $#ans + 1;
print "$cnt messages\n";
next if $cnt == 0;
foreach $a1 (@ans) {
    @nextmsg = @$a1;
#       print "$#nextmsg lines in message\n";
#       print "First line\n$nextmsg[0]";
    $uniques{$nextmsg[0]} = $a1;
    print "Malformed first line:\n$nextmsg[0]" unless $nextmsg[0] =~ /^From /;
}
@k = values %uniques;
$ucnt = $#k + 1;
print "$ucnt unique values\n";
open OUTF, ">$file" or die;
select OUTF;
foreach $mref (@k) {
    my @msg = @$mref;
    while (defined ($line = shift @msg) ) {
        print "$line";
    }
    print "\n";
}
select STDOUT;
#print "$cnt messages\n";
exit;



-- 
Graham Campbell <gc bnl gov>




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]