On Mon, 2004-11-01 at 16:19, Ron Johnson wrote:
On Mon, 2004-11-01 at 15:50 -0800, Dan Stromberg wrote:On Mon, 2004-11-01 at 15:14, Jon Biddell wrote:If I recall correctly, this was considered and not implemented-- it's not clear that nearly-identical messages can be identified properly without a lot of processing.Interesting - I wonder how the kmail guys do it - it seems to work pretty well, with only the rarest of false deletions - I had a mailbox with 16k messages (deliberately created to test it) which was a double-import of another mailbox - I *knew* there would be exactly 8192 duplicates, and kmail shot through the file in less than 30 seconds.If you want to be quick, you can just delete all but one copy of anything with the same Message-id: header. If you want to be more thorough, you could additionally generate sha-1 or md5 hashes of all messages as they come in, perhaps inserting them into a heap.That's the exact idea I had, except I was thinking of bash and Maildir...
Yeah, that's good too. BTW, deduping the addressbook would be nice as well. A lot of my addressbook entries have the same e-mail addresses twice - probably as a result of a failed experiment in which I tried to migrate from jpilot to evolution for syncing with my palm. --
Description: This is a digitally signed message part