Re: [Evolution] Deleting duplicate emails



On Thu, 2003-08-21 at 08:56, guenther wrote:
> run md5sum on the mail message body and store the resulting string in
> a file then compare each message against this list in the file, if the
> md5sums of the message body are the same then the message is
> guaranteed to be the same.

Nope.

If the md5sum hashes are different, the messages are guaranteed to be
different. If the hashes are the same, there is always a slight
probability, that the messages are *NOT* the same.

true one in a multi billion chance.. so i will take my chances, after all the chance of getting a bit error on my hard disk are bigger then that, your solution with formail is much easier on the cpu, but the probability of email systems generating the same message-id header  are much much larger then a md5sum clash... ;-)


With a limited length of hash value, you cannot guaranteed distinct
longer data chunks.


> > In some folder, for some reasons I have duplicate mails (same mail, two
> > or three times).

Vincent,

I have posted a small hack (shell script using formail) to delete
duplicate messages based on the Message-Id: header. Search the archive
for it and read my notes carefully.

As I got some feedback and it currently is not wise to run it more than
once [1] I already planned to rewrite it and post it again. Silly me
even sort of announced it without the time to code.

This seems like a good possibility to actually rewrite it and release
it...

...guenther


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]