Re: [Evolution] Deleting duplicate emails
- From: HvR <hvrietsc myrealbox com>
- To: guenther <guenther rudersport de>
- Cc: Vincent Birebent <vincent birebent ohmforce com>, evolution lists ximian com
- Subject: Re: [Evolution] Deleting duplicate emails
- Date: 21 Aug 2003 12:24:08 -0700
On Thu, 2003-08-21 at 08:56, guenther wrote:
> run md5sum on the mail message body and store the resulting string in
> a file then compare each message against this list in the file, if the
> md5sums of the message body are the same then the message is
> guaranteed to be the same.
Nope.
If the md5sum hashes are different, the messages are guaranteed to be
different. If the hashes are the same, there is always a slight
probability, that the messages are *NOT* the same.
true one in a multi billion chance.. so i will take my chances, after all the chance of getting a bit error on my hard disk are bigger then that, your solution with formail is much easier on the cpu, but the probability of email systems generating the same message-id header are much much larger then a md5sum clash...
With a limited length of hash value, you cannot guaranteed distinct
longer data chunks.
> > In some folder, for some reasons I have duplicate mails (same mail, two
> > or three times).
Vincent,
I have posted a small hack (shell script using formail) to delete
duplicate messages based on the Message-Id: header. Search the archive
for it and read my notes carefully.
As I got some feedback and it currently is not wise to run it more than
once [1] I already planned to rewrite it and post it again. Silly me
even sort of announced it without the time to code.
This seems like a good possibility to actually rewrite it and release
it...
...guenther
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]