On Sat, Sep 14, 2002 at 02:39:13PM +0200, Tony Earnshaw wrote:
Ever stop to think how many non-identical messages you've wiped out that way? Message-Ids are not guarenteed to be unique. Theoretically removing duplicate messages based on message-id is not much better than removing duplicate messages based on the Subject header (it's only better because it is assumed that msg-id is generated using at least a somewhat random sequence of characters... but how random is random? If you've ever played with rand() you know that it is a pretty poor random number generator as it will often spew out the same sequence over and over again - can you guarentee that your client doesn't use rand()?)The man who wrote what follows is Exim's (smtp mailserver/MTA) daddy, Philip Hazel, and a doctor of applied math at Cambridge University: <quote> 3.3 Message identification Every message handled by Exim is given a "message id" which is sixteen characters long. It is divided into three parts, separated by hyphens, for example "16VDhn-0001bo-00". Each part is a sequence of letters and digits, normally representing a number in base 62. However, in the Darwin operating system (Mac OS X) and when Exim is compiled to run under Cygwin, base 36 is used instead, because the names of files in those systems are not case-sensitive. The first six characters are the time the message was received, as a number in seconds - the normal Unix way of representing a time of day. If the clock goes backwards (due to resetting) in a process that is receiving more than one message, the later time is retained. After the first hyphen, the next six characters are the id of the process that received the message. The final two characters, after the second hyphen, are used to ensure uniqueness of the id. There are two different formats: (a) If the "localhost_number" option is not set, uniqueness is required only within the local host. This portion of the id is "00" except when a process receives more than one message in a single second, when the number is incremented for each additional message. (b) If the "localhost_number" option is set, uniqueness among a set of hosts is required. This portion of the id is set to the base 62 encoding of <sequence number> * 256 + <host number> where <sequence number> is the count of messages received by the current process within the current second. As the maximum value of the host number is 255, this allows for a maximum value of 14 for the sequence number. If this limit is reached, a delay of one second is imposed before reading the next message, in order to allow the clock to tick and the sequence number to get reset. </quote>
Well thatz a good deal more unique than I imagined:-) JPK -- GnuPG: ECBA EA08 C3C1 251E 5FB5 D196 F8C8 F8B7 AB60 234D
Attachment:
pgpSgqlGtqVqo.pgp
Description: PGP signature