GNOME Message Managment System (Storage mechanism)



This used to be the mail client thread.  But, we started discussing
stuff well beyond just a simple mail client.

Here's my take on a decent Message storage aproach.  I'm trying to
optimize a certain number of activities here:
    fast Message storage
    ease of Message grouping, allow multiple grouping
    ease of Message ordering
    scalable Message group sizes
    minimal space wasted
    fast Message retrieval
    ability to regenerate corrupted indexes (for ordering and grouping)
    fast Message searching abilities

In most cases, the end user is not going to care about or see the 
storage mechanism used.  They'll only see how well it implements the
above features.

I think this storage architecture should work rather well for meeting
the above goals.  Here we go.  Some of this has shown up in earlier
emails, this is mostly a condensing of other data and an expansion on
a couple of points.
    Each Message is given a unique Message ID.
    Message Composition
        <Message ID>
        [group1 [,groupN]* ]*
        Original Message Header
        Original Message Body
    Message ID Composition
        (date of arival)-(random 10 character string)
        The date of arival is a 32 bit integer, time since epoch
        The random string is open for discussion, basically, just
            there to make the message ID's unique.

    Message Storage begins at a single directory on the disk 
        Softlinks out to other directories are allowed
        This simplifies the code for accessing the Messages themselves
        This simplifies people moving their Message store location

    Messages are stored across several directories under the base dir
        The target directory is chosen by a hash of the Message ID
        Multiple levels of hashing are allowed
            This reduces directory size
        Each Message is written to a separate file
            File name = Message ID

    The Main Message Index can contain pointers to sub indexes
        Allows for sub groupings
        An Index exists for each Grouping defined
        A Message may be Indexed under multiple Groups
    A Message Index entry looks like:
        <Message ID> <Subject> <Location>
        The Location is the path to the Message file
            This is relative to the base Message directory

Some thoughts on scaling this system.  I'll try to be brief, since
you're no doubt tired of reading. :)

If a Message directory grows to have more than a thousand or so
entries in it, accessing each individual Message will be slow, since
most directory lookups are linear.  This system can be made self 
balancing though.  On startup, we could check to see if any directory
had more than some arbitrary number of entries in it, say 600.  If it
did, then we would add another 10 numbered entries to the directory,
and then re hash the Messages currently stored.  This shouldn't be
as slow as it sounds, since we will probably be able to re link the
Message files into the new directories.  So, it's just a bunch of 
directory table updates (and we're not letting these directories get
huge on us), without a lot of copying happening.

The end result is a storage system that looks like this:

~/Message/Base/Group1.db
~/Message/Base/GroupN.db
~/Message/Base/Store/[0-9]
~/Message/Base/Store/[0-9]/924436539-j3902mdnijH
~/Message/Base/Store/[0-9]/924432923-02df23ds93j
~/Message/Base/Store/[0-9]/924433610-923oiUIHli8

Comments?
--
Scott Wimer
play  --->    scottw@cgibuilder.com         http://www.cgibuilder.com/
work  --->    scottw@corp.earthlink.net     http://www.earthlink.net/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]