Hi there, This is my latest experimental summary store code. o. Makes strings unique in the mmap, which reduces VmSize. This is less important other than that uniqueness of strings also reduces VmRss a little bit. But more important is ... o. Sorts strings on usage in the mmap, which reduces VmRss (less pages needed) (this is what Tinymail's mmap Camel patch doesn't do) o. Stores flags in a separate file. Flags are the only type of data that changes relatively often in the summary. Therefore a separate file. o. Stores "ftell()" offsets in a simplistic index file. Parsing this file can be improved (I'm using fscanf, which might not even work if the data file grows to terrabytes -- but let's be honest and just accept that this is very unlikely for a summary's datafile to happen --). o. Has a way to expunge an item (and the sequence numbers of higher items are adapted automatically) The index nor mmap are already written when you do this. This is a (relatively trivial) TODO item (simply rewrite the SummaryBlock that got changed in the expunge methods, but I'm still making up my mind about this rewriting) (#z) You can expunge items by sequence number and by uid. These are vital for the new VANISHED in IMAP with QRESYNC and EXPUNGE in normal IMAP It's not yet possible to give uidsets (this is a TODO). With a uidset you might indeed achieve more efficiently getting rid of a range of items. o. Has a way to get an item by sequence number quickly (hashtable lookup) o. Has a way to get an item by uid quickly (hashtable lookup) o. Has a way to create new items easily. Deals with out-of-order adding of items (when requesting an item with a sequence number that doesn't yet exist, you'll simply get NULL) o. Scales to > 200,000 items easily o. Should be relatively thread safe (at least some attempts at making it that way are already in place). (although I'd love to use lock-free hashtables, I simply used mutexes for this) o. The hashtable keys of the sorter doesn't get fsck-ed up anymore, like in mytest6.tar.gz :-) Future: o. Indexed on blocks of 1000 items: I still need to make the per sequence getter calculate the block where it can find the item o. Writing out per block, in stead of one large mmapped file. All magic is already in place for this, I just need to make a small algorithm to define the filename for a sequence number. o. A freeze/thaw API for Dave so that he can make multiple changes. This integrates with (#z) of course. Let me know what you guys think of this ... -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be
Attachment:
mytest7.tar.gz
Description: application/compressed-tar