Hi there fellow hackers, This is some code that I have lying around, that will someday replace the summary storage. Probably a few weeks or days after Tinymail 1.0 gets released, in Tinymail 2.0's then-new branch. I kindly invite all the crazy people to check it out, investigate, comment what could be better, etc. I'll make a quick guide. First let's repeat the story of the summary: o. The summary is that of a folder that you want to see when requesting an overview. This means: to, from, subject, cc, flags, size, uid. o. Because this data is quantitative 'large', it consumes most of your E-mail client's memory, unless you are smart. Tinymail tries to be smart by mmaping this data. o. This data is read often, changes seldom, has a lot of duplicate strings (really a lot), when it changes either it's an append, deleted or a flag change. Once appended, it never changes other than flag changes or deleted. o. Some numbers to give you an idea: o. 30,000 items consume on average 10 MB mmaped data (strings) o. 6 MB admin (pointers) o. If not using GStringChunk, add 2 MB heap admin to this o. Evolution triples these numbers (if not more) Then, let's discuss the requirements, problems, details, ideas: o. The core idea is locality of memory (and mmap) data o. Mmap is fine and all, but if your data is spread around then the kernel must map much more pages into real ram modules. By putting the most referenced strings close together in the beginning of the file, we make the kernel need to load less pages. The aim of this is to reduce VmRSS size. o. Only unique strings are stored, saving disk space and therefore also mmap size. Therefore less VM size. The aim of this is to reduce the VmSize. o. Fewer pages that need to be accessed means fewer disk seeks. o. Fewer pages (in ram) that need to be accessed means fewer operations on the databus (mostly interesting for mobiles) o. We'll need fewer writes of the summary data o. Right now rewriting the summary.mmap *IS* what makes Tinymail slow when fetching a large folder (larger than 15,000 items, you'll notice this). The solution is to work in blocks in stead. o. Blocks (in this experiment code) are sized at 1000 items. This will always be fast, even on slow devices o. The flags are put in a separate flat sequential file o. Wipes just get marked, when a lot of items are wiped, a rewrite of the block is scheduled (only drastic rewrite occasion). (a wipe is an expunge or vanish that got locally synced) o. Appends means that a new block is created, in appending mode (new items that got added) o. Searches don't consume the memory and the mmap for an entire folder o. The blocks cause that when you search and you get summary items, that the items can hold references on a block only, in stead of needing to keep a reference on the entire folder's summary mmap. This makes it possible to do modest searches. Each hit will just at least keep a block of 1000 loaded. If multiple hits occur in one block, it's just one block with multiple references in memory. The solution: a three-file one. Per block you have: o. An index o. A flags data file o. A mmap file The index contains records like: 4 uid0 10 2048 94 88 84 80 This means: o. The uid is 4 bytes o. The 4 bytes of the uid o. The sequence number is 10 o. The size of the E-mail is 2048 octets o. The subject is at offset 94 o. The from is at offset 88 o. The to is at offset 84 o. The cc is at offset 80 The flags data file contains records like: 10 18910 This means: Message with sequence number 10 has flag = 18910 The data file has \0 delimited strings. The nice thing about this file is that strings that got used must, are put in front of the file (the file is sorted on usage). The index file's offsets are the amount of bytes since the start of this data file. Have fun reading code ... -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be
Attachment:
mytest3.tar.gz
Description: application/compressed-tar