Re: [Evolution] Mailbox formats



On Tue, 2002-10-29 at 13:59, Jørn Christensen wrote:
Anyone who can tell me the difference between the mailbox formats?
 - mbox
 - mh
 - maildir

mbox - classic unix mail format, lots of messages in one file, separated
by specially formatted From_ line (requiring escaping of body lines
starting with From_ (thats "From " followed by a space).
Compact, easy to append to, a pain to modify (rewrite the whole thing),
requires locking (using various schemes - or all of them just to make
sure) so that updates don't break it.  Metadata has to be included
within the message headers within the overall message file - so requires
a complete file rewrite to mark a message as read. Dangerous to use on
NFS mounted filesystems (unless your locking daemon/protocol is really
reliable).  Nice and easy to just use more to read/search the messages,
compress well.

mh - file per message, directory per mailbox format.  Very similar to
usenet news - each message is in a file which is named by a sequential
number.  Appending (ie add new message) is easy other than deciding the
next number in sequence which is subject to a race condition or requires
locking.  Deleting/updating also easy.  Unix tools work well on one
message per file systems.   Tends to have some metadata in external
files - ie sequences (ie unread etc).  Not NFS safe (mh defines no
locking - and it really is needed for adding a new message).  Burns some
space in that it uses a cluster or your fs equivalent per message, eats
inodes for breakfast.  Not very compressible - too little plaintext to
let algorithms loose on plus compressing smaller than basic block size
gains nothing.

maildir - basically a file per message, directory per mailbox format,
with a twist that there are 3 (4 with the parent) directories per
mailbox.  Appending, deleting, updating easy.  Files named in an
inherently unique and sortable fashion, metadata typically added by
modifying (adding to the basic) filename, which has limitations.  Fairly
easy to work on with unix tools.  Mostly NFS safe - in that basic
operations are defined in a way that is safe even without any locking,
*but* more complex operations (rewrites) may hit limitations and require
a degree of synchronisation.  Burns some space in that it uses a cluster
or your fs equivalent per message, eats inodes for breakfast plus
elevenses.  Not very compressible - too little plaintext to let
algorithms loose on plus compressing smaller than basic block size gains
nothing.

Any pro and cons? Speed?

On speed, the mh/maildir operations stress your kernel/fs directory
operations significantly.  Courier have some benchmarks (which tend to
favour their maildir leanings - however look OK to me).
        http://www.courier-mta.org/mbox-vs-maildir/

Maildir on NetApp (which has keyed btree type directory lookups) kick
ass.

Personally I have all my mail in Maildir structures under a courier imap
server (apart from a set of mbox files which are archive directories -
incoming mail is directly delivered into that day's archive which is
compressed at the end of the day - I do not expect to ever actually use
those archives, but they are there for safety).

        Nigel.

-- 
[ Nigel Metheringham           Nigel Metheringham InTechnology co uk ]
[ - Comments in this message are my own and not ITO opinion/policy - ]





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]