Re: [Evolution] Deleting duplicate emails



HvR,

run md5sum on the mail message body and store the resulting string in
a file then compare each message against this list in the file, if the
md5sums of the message body are the same then the message is
guaranteed to be the same.

Nope.

Calming down and reading RFC 1321...


If the md5sum hashes are different, the messages are guaranteed to be
different. If the hashes are the same, there is always a slight
probability, that the messages are *NOT* the same.

With a limited length of hash value, you cannot guaranteed distinct
longer data chunks.

The MD5 algorithm indeed is designed for the mentioned purpose -- to
"reliably" identify mails by a short checksum. And it is very wide used
for this purpose.

So you are very right.


The only thing that triggered me, was the guarantee:

As md5sum is limited to 128 bits, there are only 2^128 different
fingerprints and therefore feeding 2^128 + 1 different messages will
produce at least 1 fingerprint to be associated with 2 different mails.

...guenther


-- 
char *t="\10pse\0r\0dtu\0  ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]