Reworked version that has the hashtables for looking up the items by either uid or sequence. On Thu, 2007-12-06 at 23:51 +0100, Philip Van Hoof wrote: > Hi there fellow hackers, > > This is some code that I have lying around, that will someday replace > the summary storage. Probably a few weeks or days after Tinymail 1.0 > gets released, in Tinymail 2.0's then-new branch. > > I kindly invite all the crazy people to check it out, investigate, > comment what could be better, etc. > > I'll make a quick guide. > > First let's repeat the story of the summary: > > o. The summary is that of a folder that you want to see when requesting > an overview. This means: to, from, subject, cc, flags, size, uid. > > o. Because this data is quantitative 'large', it consumes most of your > E-mail client's memory, unless you are smart. Tinymail tries to be > smart by mmaping this data. > > o. This data is read often, changes seldom, has a lot of duplicate > strings (really a lot), when it changes either it's an append, > deleted or a flag change. Once appended, it never changes other than > flag changes or deleted. > > o. Some numbers to give you an idea: > > o. 30,000 items consume on average 10 MB mmaped data (strings) > o. 6 MB admin (pointers) > o. If not using GStringChunk, add 2 MB heap admin to this > o. Evolution triples these numbers (if not more) > > Then, let's discuss the requirements, problems, details, ideas: > > o. The core idea is locality of memory (and mmap) data > > o. Mmap is fine and all, but if your data is spread around then > the kernel must map much more pages into real ram modules. > > By putting the most referenced strings close together in the > beginning of the file, we make the kernel need to load less > pages. > > The aim of this is to reduce VmRSS size. > > o. Only unique strings are stored, saving disk space and > therefore also mmap size. Therefore less VM size. > > The aim of this is to reduce the VmSize. > > o. Fewer pages that need to be accessed means fewer disk seeks. > > o. Fewer pages (in ram) that need to be accessed means fewer > operations on the databus (mostly interesting for mobiles) > > o. We'll need fewer writes of the summary data > > o. Right now rewriting the summary.mmap *IS* what makes Tinymail > slow when fetching a large folder (larger than 15,000 items, > you'll notice this). The solution is to work in blocks in > stead. > > o. Blocks (in this experiment code) are sized at 1000 items. > This will always be fast, even on slow devices > > o. The flags are put in a separate flat sequential file > > o. Wipes just get marked, when a lot of items are wiped, a > rewrite of the block is scheduled (only drastic rewrite > occasion). (a wipe is an expunge or vanish that got locally > synced) > > o. Appends means that a new block is created, in appending mode > (new items that got added) > > o. Searches don't consume the memory and the mmap for an entire folder > > o. The blocks cause that when you search and you get summary > items, that the items can hold references on a block only, in > stead of needing to keep a reference on the entire folder's > summary mmap. > > This makes it possible to do modest searches. Each hit will > just at least keep a block of 1000 loaded. If multiple hits > occur in one block, it's just one block with multiple > references in memory. > > > The solution: a three-file one. > > Per block you have: > o. An index > o. A flags data file > o. A mmap file > > The index contains records like: > > 4 uid0 10 2048 94 88 84 80 > > This means: > o. The uid is 4 bytes > o. The 4 bytes of the uid > o. The sequence number is 10 > o. The size of the E-mail is 2048 octets > o. The subject is at offset 94 > o. The from is at offset 88 > o. The to is at offset 84 > o. The cc is at offset 80 > > The flags data file contains records like: > > 10 18910 > > This means: > > Message with sequence number 10 has flag = 18910 > > The data file has \0 delimited strings. The nice thing about this file > is that strings that got used must, are put in front of the file (the > file is sorted on usage). The index file's offsets are the amount of > bytes since the start of this data file. > > > Have fun reading code ... > > > -- > Philip Van Hoof, freelance software developer > home: me at pvanhoof dot be > gnome: pvanhoof at gnome dot org > http://pvanhoof.be/blog > http://codeminded.be > > > > _______________________________________________ > tinymail-devel-list mailing list > tinymail-devel-list gnome org > http://mail.gnome.org/mailman/listinfo/tinymail-devel-list -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be
Attachment:
mytest4.tar.gz
Description: application/compressed-tar