Re: Number 9 of the mytest summary store: writing things



On Mon, 2008-01-07 at 09:47 +0000, Dave Cridland wrote:

> Basically, this is handling the stuff that I think your is currently  
> doing least optimally. ENVELOPEs don't change, so storing them in the  
> same place as FLAGS doesn't make sense. 

I'm not! :). The mytest experiments store flags in flags_0.idx.

But you are right, I haven't put a lot of thought into handling the
flags efficiently.

Mine simply delays the writing of that file to the thawing of the
summary if the summary is frozen, to the finalise if pending writes are
required (unlikely, as you'll have thawed if you use the API correctly)
or immediately if the summary isn't frozen.

Immediately means that I write all of the flags for all items even if
only one item's flags changed (so you really want to freeze and thaw).

Loading the flags happens in the same loop as loading the summary's
index.

Note that I'm not storing MODSEQ. Once multiple fields are going to be
changed often, we might indeed want something more clever.


> Admittedly, if we want to  
> handle ANNOTATE, we need something more complex than this, but it'd  
> still be a bright idea to keep it all away from the more static data.

Agree

> That all said, it's working, and seems reasonably quick - faster than  
> the code I have, so I'll probably try to blend it in.
> 
> So... How it works.
> 
> I'm blocking the data in multiple mmap files - this needs more  
> cleverness, because really, I need to be using smaller blocks toward  
> the end of larger mailboxes. mmap files are called by the sequence  
> number they start at.
> 
> Finding data by sequence number is relatively easy, although less  
> than efficient as yet, I'm just running through all blocks in hash  
> order to find one that looks as if it might be right.
> 
> When a UID gets removed - currently not really implemented - I don't  
> rewrite all subsequent blocks. Instead, I rename them. This means  
> less I/O, which is always a good thing.

Clever ...

> I think that this is basically a sound design, albeit badly  
> implemented.
> 
> Some potential improvements:
> 
> 1) I suspect that having an index file containing block sequence  
> starts, lengths, and UID extents would be faster than using the  
> directory listing. Renaming files causing the directory "file" to be  
> rewritten, so it makes sense to avoid this, and give the blocks a  
> 64-bit ID instead.

Uhu (I think the kernel is quite good at caching/buffering the
directory's file. So you might get away with this yet make it perform
quite good).

> 2) Lots of bits of this are very badly done and it's filled with  
> bugs. That said, the basic "set" and "[]" operations work. I think.  
> Mostly.
> 
> I've attached both newcache.py, which contains the implementation,  
> and ct1.py, which contains a simple test driver.

I'll check it out this evening.


-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]