On Sat, Sep 14, 2002 at 01:00:54AM -0400, Jeffrey Stedfast wrote:
On Fri, 2002-09-13 at 23:46, PeterKorman wrote:On Wed, Aug 28, 2002 at 02:36:30PM -0400, Peter Williams wrote:On Wed, 2002-08-28 at 14:14, Mertens Bram wrote:On Wed, 2002-08-28 at 20:06, Antonio Bemfica wrote:This is trivial to do using procmail (by checking the Message-ID header). You can get a bit more sophisticated and do an MD5 hash of the body of incoming messages and store it on a database (dbtool, for example: http://www.daemon.de/dbtool/). If you or anyone else is interested I can post a recipe that does the above.Well I certainly am interested! I don't know anything about procmail though, would I have to configure much to get this working? I am running Evo 1.0.8 btw...You don't want to detect duplicates based on message-id; it's trivial for an attacker to prevent you from seeing a given message, and the same problem could happen even without malicious intent. PeterWhat would an attacker gain by falsifying a message-id?If an attacker can gain by spoofing an ip address, why not faking a message-id? There are probably numerous possibilities, only one of which is blocking the recipient from seeing that particular message.
I agree that message-id's are not guaranteed unique. I agree that an attacker could spoof almost any piece of an email message. But, if he can spoof my PGP signature then he's a lot smarter than me and he likely already has complete access to every machine I control including my PDA. The problem of message duplication has, for me, been limited to mailing lists like this. I try to limit concern for low-likelyhood malfunction only to processes that can cause injury and death. It is possible to use email methodology for certain high level process control functions, say, oil refinery process monitoring. Treating message-id's as unique in that environment should result in lifelong prohibition from any programming occupation -- something a bit less severe than what happened to Kevin Mitnik. Probably, any major-dommo control channel should not use this treatment of message ID's either. For things like that nothing less than MD5 matching would suffice. We might even agree on that. JPK
I dont wish to provoke jihad, but the keystroke sequence "D~=<CR>" is all mutt needs to mark (all but 1) messages with duplicate ID's for deletion.Ever stop to think how many non-identical messages you've wiped out that way? Message-Ids are not guarenteed to be unique. Theoretically removing duplicate messages based on message-id is not much better than removing duplicate messages based on the Subject header (it's only better because it is assumed that msg-id is generated using at least a somewhat random sequence of characters... but how random is random? If you've ever played with rand() you know that it is a pretty poor random number generator as it will often spew out the same sequence over and over again - can you guarentee that your client doesn't use rand()?)Of course this is the evolution list so most readers probably are not using mutt. Perl's Mail::Audit tools could probably do the job transparently.Evolution doesn't use Perl, so there's no way this could do *anything* transparently for Evolution.MD5 Hash, even if it were instantaneous would not fit the bill for dup messages that arise from CC artifacts.Nor would it even work in 99% of the cases anyway. Often you get duplicates because they have gone through different paths to arrive at your machine and this their md5s would be different. Usually at least one of these paths is due to a message going to a message list that you are subscribed to. What is the first most mailing list software does the instant it receives a message? It munges the Subject header and often adds mailing-list headers. Possibly even changes the Reply-To header and god knows what else.I guess you could combine the 2 methods.What does this gain you? Nothing. We've already seen that Message-Ids can be spoofed and aren't even necessarily unique even if we assumed no one would spoof them. We've also seen that md5 is useless for detecting duplicates.Select deletion candidates by nominating via dup message-id and only run MD5 against header-striped versions of the dup-ID nominations.Question: which header(s) do you strip? You can't trust that the mailing-list manager left any of the headers alone, and Received/Delivered-To/etc will likely be different anyway. That pretty much leaves the following headers: MIME-Version: 1.0 whoopty doo (and who knows, maybe the mailing list manager might even modify this one ;-)Then delete all but 1 of the messages that share the same MD5.But they won't share the same md5. Okay, lets presume for a moment that we decided to strip *all* headers because after all, we can't be sure that the mailing list didn't modify them and/or they are different due to a different routing. Do we just md5sum the message body? What do we even mean by the message body? The first text part we find? Or maybe the the entire MIME structure? Well, first text part is probably not a good bet - the best bet is probably the entire MIME structure. Okay, now we just md5sum this, right? Ho ho ho. Wrong again. Oops, the mailing list munged the MIME structure to add its footer or whatever. Now was it one of those sm,arter mailing lists that add the signature as a new MIME part if the message was a multipart/*, or is it one of those brain-dead ones that just append the signature without a care in the world as to whether the message is a multipart or not? And then we have those brain dead mailing lists that append their footer without changing it to the same Content-Transfer-Encoding that the message is in (for when the only part is a text/* part but is base64 encoded for example). Oh goody, now what?It would probably run pretty fast as long as you didn't need a seperate image activation everytime you run MD5.Yea... because parsing every single message in a folder and comparing message-ids is *fast* (actually, Evolution's MIME parser is pretty damn fast but that's besides the point). </sarcasm> Okay, so here's something I just thought of - each mbox/maildir/mh/etc folder could have another file associated with it containing the message-id of each message. When appending, scan them all for an identical message-id and then do something to determine if they are identical or not. If so, don't append the message. Sound good? Sure, I suppose...but that is assuming that you just want to eliminate duplicates in the same folder. Is that all you want? Or? Even if that *is* all you want, you STILL need a fool-proof way of determining if 2 messages are identical or not. And your method just won't work. Period. End result? Back to step one...but step 2 is solved? The way I see it, this feature has no business being in Evolution - if you really want the feature then I suggest you implement it yourself using a perl script and have it act on the mail either via having Evolution fork/exec it in the filter code or by you having your perl program handle it *before* Evolution touches it. It's the only way we can both be happy :-) Jeff -- Jeffrey Stedfast Evolution Hacker - Ximian, Inc. fejj ximian com - www.ximian.com
-- GnuPG: ECBA EA08 C3C1 251E 5FB5 D196 F8C8 F8B7 AB60 234D
Attachment:
pgpBaNwEjFOGM.pgp
Description: PGP signature