RE: Early-posting a new idea for the summary format



On Fri, 2007-02-02 at 13:55 +0200, Dirk-Jan Binnema nokia com wrote:

> Mhwww... I am not so convinced about the duplicate string
> argument; say, in a mail folder I have 100 mails (and that's a lot!)
> with the subject "This is my subject"; by just storing it once,
> I will save 99x18 = 1782 bytes. That is not a lot, and of course
> it adds some complexity, and makes the summary files quite
> fragile.
> 
> Also note that embedded systems (like the 770/N800) often use 
> compressed file systems (jffs2), which make the savings even
> less.

That's true however. The real purpose is to have less memory space being
used. By that I mean that strings that reoccur will be more likely to be
swapped-in than strings that aren't. The idea is to have an intelligent
store that puts the aliases that are used most in the beginning of the
file.

That way the block that has the most used strings will be always
swapped-in. Whereas the least-used strings will be swapped-in on demand.

It's therefore not just a memory improvement, but also a performance
one. And decreasing the amount of needed reads and swap-ins.

In other words .. sorting AND searching would be a lot faster. Because
the strings that are used most, will be in real-ram, rather than only in
the mmap (which might mean that it's still on only available on the
jffs2 file).

> >I would also take out the flags and put those in a different 
> >file, as a read-write mmap. Same reason: level wearing of 
> >flash devices, avoiding total rewrites, etc etc. 

> Flash wearing should not be such a big problem, I think. But
> taking out those flags might be interesting. Dunno.

Yes, the flags is something that I will nonetheless take out of it.
Because right now changing a single flag can cause a full rewrite of the
entire mmaped file. That's just plain stupid in terms of time-to-write
it and in terms of level wearing too.

So that too would be a major performance improvement, especially for
when storing or changing a lot flags at the same time. Though I think
those "changes" are cached until the very last moment.

It's also about investigating things like this. A recent discussion with
you about setting the flags made me realise that I didn't comprehend
that part good enough myself. So I started digging and discovered some
interesting potential performance improvements.

Storing the flags in a different mmap, will fix all of those.


> >> >"W" means "word length". On a 32 bit computer this is 4 bytes, on a 
> >> >64 bit computer this is 8 bytes.
> >> [...]
> >> 
> >> Would that mean that 32-bit roadmaps are not readable by a 64-bit 
> >> program? It's not uncommon to switch between 32 and 64 bit mode on, 
> >> say, an AMD64 - it would suck if the summaries would be unreadable.
> >
> >That is what it means, yes. Though the summary mmaps could be 
> >converted between different architectures, but not without a tool.
> 
> I guess it would be better to use the 32-bit 'words' on 64 bit platforms
> as well. Otherwise, people who share their homedir between different
> platforms will get screwed.

Well I don't really see a reason for doing that on a mobile device. But,
well, I can of course put the word-length and the "endianism" in the
filename of the summary files. This solves it too.

It basically means that on another architecture, another file will be
created.

At this moment this ain't a problem because network byte order is always
used for storing integers.

Which can also be a solution for this ... a slightly less efficient one,
but one.


Anyway, I posted the early-idea early for this type of reactions. So
please keep 'em going. I know that the summary format can definitely be
improved. It's not for today, as what we have today can work too.

But sooner or later, I will proceed and improve the summary format
drastically. Probably with the assistance of some jffs2 or LogFS
developer at some conference ;-)

Or by reading filesystem code. As I want to make it that good, that it
melts perfectly with the filesystem code. By that I mean, using the
right cache size (or, thus, fwrite in stead of write) so that entire
blocks are always written (in stead of bytes). Things like that ..

I really don't want tinymail to be a burden for the flash card.
Destroying hardware through excessive writing on a device that has to
deal with level-wearing, like flash, is not a nice thing to do ;-)

It'll be considered a benefit over over E-mail solutions for mobiles.

In other words: compete by being the best. Not just good.


-- 
Philip Van Hoof, software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://www.pvanhoof.be/blog







[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]