Basic terminology: o. A summary contains n summary blocks o. A summary block contains ~1000 summary items o. A summary item has flags, cc, uid, from, subject, etc etc New: o. Writing the data (persisting it) o. Freeze and Thawing o. Keeping state about flag-changes, expunges and appends Missing: o. Defining the filenames of summary blocks. Right now will only one summary block be created for all items (number 0), resulting in three files: data_0.mmap, index_0.idx and flags_0.idx In future the idea is to have data_n.mmap, index_n.idx and flags_n.idx files (the three per summary block created). 50,000 items will indeed and effectively result in 50 data_n.mmap files being mmap()ed if the entire folder is needed. If a search result of a search on the summary's data caused hits in 10 of the 50 files, then 10 files will be mmap()ed. Hence why grouping them together on sequence number: most searches yield results that are close together in time. Sequence numbers on IMAP servers are usually grouped in time too (relatively grouped in time, depending on various things). o. Error strategy: what if there's not enough space to write the summary files? What if a file's gone missing? o. A flock(): what if a second process tries to access the same mapped files? I think by just flock()-ing the persisting functions, we are relatively save already (I just wonder what happens with my read-only mapping if a rename()-overwrite happens on a mapped file). Of course is the advise for application developers to either use a new cache-dir per process or to have a service that gives the data to both applications over an IPC system (but that's not the point of protecting the processes from influencing each other: what if still the app developer did it wrong? What can we do about that?) -- I know this is hard to cope with, perhaps just a g_critical and an abort() if we detect this situation? (how do we detect it?) -- Writing strategy: o. I keep a "has_flagchg", a "has_expunges" and a "has_appends". These are the tree types of changes that are possible for a summary. I keep these booleans per summary block o. The functions summary_item_set_flags, summary_add_item and the functions summary_expunge_item_by_uid and summary_expunge_item_by_seq will modify the values of those booleans. o. The summary_freeze function will make a function called summary_block_persist refrain from actually writing for every summary block in the summary passed as parameter to summary_freeze o. The summary_thaw function will unset the freeze on each summary block in the summary passed as parameter to summary_thaw. On top of that will it call the summary_block_persist for each summary block in that summary. o. The summary_block_persist function checks what the best write strategy will be by evaluating the booleans has_flagchg, has_expunges and has_appends. - If has_flagchg but not has_expunges and not has_appends then a function that just writes the flags-file is utilised to perform persisting the summary block. - Else if either has_expunges or has_appends then a function that writes the flags-file, the index-file and the data-file is utilised to perform persisting the summary block. o. The persisting of a summary-block happens by first sorting all strings by occurrence and making them unique. Then the unique strings are in that sort-order written to the data-file and offsets to the strings are updated into the summary item pointers using ftell(). The index file is written using the pointers of the summary items, meanwhile the flags file is written using the flags of the summary items. The data-file is now mapped and the summary items re-prepared. The summary block is persisted in a VmRss friendly way. o. When adding summary items to the summary (which will select a summary block where the item will be added to using the requested sequence number), the caller must attempt to avoid string duplicates for the CC and TO fields of the items by sorting the addresses in the comma separated strings of the items. Currently will the experimental example do this for you. This further reduces VmRss as you'll have singled-out more data as duplicate and made more data unique in memory this way. Please test :) -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be
Attachment:
mytest9.tar.gz
Description: application/compressed-tar