Re: [Evolution] Evo filters

From: Jon Biddell <jon mandrake net au>
To: Ron Johnson <ron l johnson cox net>
Cc: evolution lists ximian com
Subject: Re: [Evolution] Evo filters
Date: Sat, 15 Jan 2005 11:54:53 +0000

On Fri, 2005-01-14 at 18:14 -0600, Ron Johnson wrote:

On Sat, 2005-01-15 at 10:57 +0000, Jon Biddell wrote:

I've just switched (back !!) to Evo from kmail, and have recently found
a very nice script that kills duplicate messages quite nicely


You know what we're going to ask, right?


:-)

OK - here it is - I take no credit for it, apart from a little tweak for
my machine....  Credit for this script goes to Dan Jones
( ddjones riddlemaster org )


Rather than start hacking the source of Evolution, which would have had
quite a learning curve, I took a different approach.  I've written a
Perl script to handle it.  It looks for matching Message-ID headers,
then compares the MD5 hashes of each message to ensure that they
actually are dupes.  (I suppose that this isn't absolutely guaranteed
but the chances of two messages having the same ID and the same MD5 hash
and not being a duplicate is infinitesimal.)  You can call it with the
names of one or more mailboxes (you must be either in the directory
holding the mailbox or pass it the full path);

rdbox Inbox

You can also use the -d switch, with the name of a directory.  If you
add the -r switch, it'll recurse subdirectories.  For my setup, using
Evolution on Novell Linux Desktop, I used the following command:

rdbox -r -d /home/mylogin/.evolution/mail/local

and it finds every mailbox and checks for dupes.

The script doesn't touch (other than reading) your actual mailboxes.  It
creates a new mailbox with a .clean extension, which contains all of
your messages without dupes.  You can then rename your originals or move
them to a safe location and rename the new files by stripping off the
clean extension.  I'll probably automate this in the next iteration of
the script.

It uses two modules, Digest::MD5 and Getopt::Long, which are available
at CPAN.

This is essentially alpha software - it works on my system but hasn't
been extensively tested.  As I said, it shouldn't touch your original
mailbox but doesn't come with any guarantees.  Bug reports or problem
requests are welcome to bugsATriddlemaster.org

To create the script, just cut and paste it into a file called rdbox.pl
and set execute permissions on the file.  Hope you find it helpful!

-----------------------------------------------------------------------

#!/usr/bin/perl

use strict;
use warnings;

use Digest::MD5 qw(md5);
use Getopt::Long;

#SUB DECLARATIONS
sub ProcFile($);
sub ProcDirectory($);
sub ProcMessage($);
sub PrintUsage();

#GLOBAL VARIABLES
my (%MessageStore, $FileWrites);

#COMMANDLINE ARGUMENTS
my (@directories, @files);
my $recurse = '';
my $verbose = '';
my $usage = '';
my $global = '';

my $result = GetOptions("directory=s" => \ directories,
                                                "file=s" => \ files,
                                                "recurse" => \$recurse,
                                                "verbose" => \$verbose,
                                                "usage" => \$usage,
                                                "global" => \$global);


my $GoodArg = 0;
if(@files)
{
        $GoodArg = 1;
        for(@files)
        {
                ProcFile($_);
        }
}

if(@directories)
{
        $GoodArg = 1;
        for(@directories)
        {
                ProcDirectory($_);
        }
}

if(@ARGV)
{
        $GoodArg = 1;
        for(@ARGV)
        {
                ProcFile($_);
        }
}

PrintUsage() unless $GoodArg;

sub ProcFile($) 
{
        my $mbox = shift;
 
        print "Processing file $mbox\n";
        open MAILBOX, "<$mbox" or die "Can't open $mbox\n";
        open CLEAN, ">$mbox.clean" or die "Can't open $mbox.clean\n";
 
        %MessageStore = () unless $global;
 
        $FileWrites = 0;
 
        local $/ = "\n\nFrom ";

        my $Counter = 1;
        $_ = <MAILBOX>;
        $_ =~ s/\n\nFrom $//;
        ProcMessage($_);
        #print "Processed Message \#$Counter\n";
 
        while(<MAILBOX>) 
        {
                $Counter++;
                $_ =~ s/\n\nFrom $//;
                ProcMessage("\n\nFrom $_");
                print "Processed Message \#$Counter\n" if $verbose;
        }

        close MAILBOX;
        close CLEAN;
}

sub ProcDirectory($)
{
        my $directory = shift;
 
        chdir $directory or die "Can't change to directory $directory
\n";
 
        opendir DIRECTORY, $directory or die "Can't open directory 
$directory
\n";
        my @DirList = grep !/^\.\.?$/, readdir DIRECTORY;
        for(@DirList)
        {
                if(-d)
                {
                        print "Found directory $_\n" if $verbose;
                        if($recurse) 
                        {
                                ProcDirectory("$directory/$_");
                                chdir $directory;
                        }
                }
                elsif(/(.*)\.ibex\.index$/){
                        print "Found file $1\n" if $verbose;
                        ProcFile($1);
                }
        }
        print "\n";
}

sub ProcMessage($)
{
        my $Message = shift;
        my @MessageParts;
        my $HashValue;
        my $MessageId;
 
        my $InitWS;
        my $WSLength;
 
        $Message =~ /^(\s+)/;
        $InitWS = $1;
 
        $WSLength = 0;
        if($InitWS) {
                $Message =~ s/$InitWS//;
                $WSLength = length $InitWS;
        }
 
        @MessageParts = split /\n\n/, substr($Message, $WSLength), 2;
        unless($MessageParts[1])
        {
                print "Error in message!\n$Message\n\n";
                return;
        }
 
        $HashValue = md5($MessageParts[1]);
 
        $MessageParts[0] =~ /Message-I[dD]: (.*)/;
        $MessageId = $1;
 
        unless($MessageId)
        {
                print STDERR "Can't find Id in this 
message:\n$MessageParts[0]";
                return;
        }
 
        if(exists $MessageStore{$MessageId}) 
        {
                if($MessageStore{$MessageId} eq $HashValue)
                {
                        print "Found dupe of MessageID $MessageId!\n"
if 
$verbose;
                }
                else
                {
                        print CLEAN $Message;
                        print "False positive of $MessageId" if
$verbose;
                }
        }
        else
        {
                $MessageStore{$MessageId} = $HashValue;
                unless ($FileWrites) {
                        $Message =~ s/^\n+//;
                }
                print CLEAN $Message;
                $FileWrites++;
                print "Storing Message number $FileWrites, ID#: 
$MessageId\n" if
$verbose;
        }
}

sub PrintUsage() {
 
print <<USAGE;
rdbox - utility to remove duplicates from mbox files.
usage: rdbox [options] filename
       rdbox [options] -d directoryname

options:
        -d/-directory   name of directory containing mbox files

        -r/-recurse     recurses subdirectories below
 
        -u/-usage               print this message
 
        -f/-file                name of mbox files[s]
 
        -v/-verbose             print extra messages
 
        -g/-global              check for duplicates across all
mailboxes
 
USAGE
}



    * Previous message: [Evolution] Evolution 2.x feature request and 
commentary
    * Next message: [Evolution] Re: Hello
    * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

_________________________________________________
Jon Biddell
HRMIS Systems Administrator
PREOD, Mount St Mary Campus
Australian Catholic University Limited
ABN 15 050 192 660
25 A Barker Road   Strathfield NSW 2135
Tel: (61 +2) 9701 4248   Fax; (61 +2)  9701 4105
Mobile: 0419 422 537

References:
- [Evolution] new feature request -- customise action of 'junk' button
  - From: Russell Fulton
- Re: [Evolution] new feature request -- customise action of 'junk' button
  - From: Jon Biddell
- Re: [Evolution] new feature request -- customise action of 'junk' button
  - From: JP Rosevear
- Re: [Evolution] new feature request -- customise action of 'junk' button
  - From: Russell Fulton
- [Evolution] Evo filters
  - From: Jon Biddell
- Re: [Evolution] Evo filters
  - From: Ron Johnson

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]