On Mon, 2004-11-01 at 15:50 -0800, Dan Stromberg wrote:
On Mon, 2004-11-01 at 15:14, Jon Biddell wrote:If I recall correctly, this was considered and not implemented-- it's not clear that nearly-identical messages can be identified properly without a lot of processing.Interesting - I wonder how the kmail guys do it - it seems to work pretty well, with only the rarest of false deletions - I had a mailbox with 16k messages (deliberately created to test it) which was a double-import of another mailbox - I *knew* there would be exactly 8192 duplicates, and kmail shot through the file in less than 30 seconds.If you want to be quick, you can just delete all but one copy of anything with the same Message-id: header. If you want to be more thorough, you could additionally generate sha-1 or md5 hashes of all messages as they come in, perhaps inserting them into a heap.
That's the exact idea I had, except I was thinking of bash and Maildir... -- ----------------------------------------------------------------- Ron Johnson, Jr. Jefferson, LA USA PGP Key ID 8834C06B "Experience hath shewn, that even under the best forms [of government] those entrusted with power have, in time, and by slow operations, perverted it into tyranny." Thomas Jefferson
Attachment:
signature.asc
Description: This is a digitally signed message part