Summary store, impl n. 7



Hi there,

This is my latest experimental summary store code.

o. Makes strings unique in the mmap, which reduces VmSize. This is less
   important other than that uniqueness of strings also reduces VmRss a
   little bit. But more important is ...

o. Sorts strings on usage in the mmap, which reduces VmRss (less pages
   needed) (this is what Tinymail's mmap Camel patch doesn't do)

o. Stores flags in a separate file. Flags are the only type of data that
   changes relatively often in the summary. Therefore a separate file.

o. Stores "ftell()" offsets in a simplistic index file. Parsing this
   file can be improved (I'm using fscanf, which might not even work if
   the data file grows to terrabytes -- but let's be honest and just
   accept that this is very unlikely for a summary's datafile to
   happen --).

o. Has a way to expunge an item (and the sequence numbers of higher
   items are adapted automatically)

   The index nor mmap are already written when you do this. This is a
   (relatively trivial) TODO item (simply rewrite the SummaryBlock that
   got changed in the expunge methods, but I'm still making up my mind
   about this rewriting) (#z)

   You can expunge items by sequence number and by uid. These are vital
   for the new VANISHED in IMAP with QRESYNC and EXPUNGE in normal IMAP

   It's not yet possible to give uidsets (this is a TODO). With a uidset
   you might indeed achieve more efficiently getting rid of a range of
   items.

o. Has a way to get an item by sequence number quickly (hashtable
   lookup)

o. Has a way to get an item by uid quickly (hashtable lookup)

o. Has a way to create new items easily. Deals with out-of-order adding
   of items (when requesting an item with a sequence number that doesn't
   yet exist, you'll simply get NULL)

o. Scales to > 200,000 items easily

o. Should be relatively thread safe (at least some attempts at making it
   that way are already in place). (although I'd love to use lock-free
   hashtables, I simply used mutexes for this)

o. The hashtable keys of the sorter doesn't get fsck-ed up anymore, like
   in mytest6.tar.gz :-)

Future:

o. Indexed on blocks of 1000 items: I still need to make the per
   sequence getter calculate the block where it can find the item

o. Writing out per block, in stead of one large mmapped file. All magic
   is already in place for this, I just need to make a small algorithm
   to define the filename for a sequence number.

o. A freeze/thaw API for Dave so that he can make multiple changes. This
   integrates with (#z) of course.


Let me know what you guys think of this ...

-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be



Attachment: mytest7.tar.gz
Description: application/compressed-tar



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]