Re: Last version of the new summary storage engine



Dirk-Jan Binnema nokia com wrote:

I tried sending a reply, but it seems that that failed. If you received one, here's another one :)

>  o. Duplicate strings are made unique in the mmap
Philip, I am a bit skeptical of this claim -- can you back that up with some numbers? I mean let's take some *extreme* case, where the
same 10-byte string is seen 1000x in a mailbox; by having only
one of them we save about 10k. Not very much compared to the normal
size, even in this extreme case. At least the mem saving is
rather easy to calculate :)
The problem that this new summary storage is trying to solve is not aimed at reducing the amount of (virtual) memory being used (we call this VmSize). If that would be a problem, I wouldn't have done the current summary.mmap format the way I did it (because it wastes virtual memory a lot).

The real problem is the following:

Assume we have a string "Dirk-Jan C. Binnema <Dirk-Jan Binnema nokia com>" as the From address of 20 messages. I received some of these 20 messages a long time ago and some of them I received very recently. Therefore are the strings in the summary located both in the beginning, in the middle and at the end of the mapped file.

Those strings are, however, identical in content. Not in location, but in content (which is what matters); they are.

The problem with that is (imagine that I sort my list on "Dirk-Jan") that each and every one of the strings will be needed in real memory modules. The kernel only knows about pages. A page is four kilobytes in size. This means that each page that contains such an identical string, must be paged (swapped) into real memory (we call this VmRSS).

Better would be if the string was only stored once, close in location to other frequently required strings. That way fewer pages would be needed in real memory modules (fewer VmRSS).

We call all of the memory, both real memory and just-mapped, if we cat /proc/$PID/status, VmSize: this is the virtual memory's size. This number is mostly worthless as it contains the memory needed for the mappings (the libraries and the files that we mapped ourselves), the stack and the heap allocations (like gslice and malloc). Compared to that does valgrind's massif tool only measure heap and stack allocations. This is also the most interesting part, as those will have a direct impact on the VmRSS.

For Tinymail what valgrind reports is not really the most interesting part. Although what valgrind reports has a direct impact on VmRSS, my summary's mapping has a larger impact on said VmRSS. When I say that Tinymail uses 10MB heap, I don't mention that Tinymail also has a mapped file of 15MB that has frequently being trashed. VmRSS (the amount of real memory being used) will therefore probably be around 15 or maybe 20 MB in stead of the 10MB heap that valgrind reported.

The VmRSS indicates the amount of memory that is being accessed. That is, the memory that is (therefore) in real memory modules. Compared to the VmSize, this is a very important number (the VmSize is actually not interesting at all, as it's a polluted number caused by the mappings of the libraries and because being part of the virtual memory doesn't mean that you as a page are also actually going to be frequently used).

How do we further reduce VmRSS? The answer is simple: we make sure that we localize memory of the mapping efficiently. How do we do that? By putting memory that is going to be frequently needed together in one page in mappings. What memory will that be?

In my example will the string "Dirk-Jan C. Binnema <Dirk-Jan Binnema nokia com>" be needed 20 times. The string "Tinne Hannes <tinne hannes gmail com>" will probably be needed even more for my personal situation. However, Sergio will have an equal amount of E-mails from you as I have, but probably only one from Tinne. We conclude that in both cases your From string is important, but Tinne's From string will only be important for me.

It's therefore quite likely that yours and Tinne's will be needed in real memory modules on my tablet PC, and only yours will be needed in real and Tinne's one will only perhaps be needed in real memory modules on Sergio's tablet PC.

That's why I sort strings on their frequency of usage in the mapping. That way will frequently needed strings all be put in the same page. Less frequently needed strings will also be put together in their pages.

On speed: swapping from the mapped file (VmSize) to real memory (VmRSS) is a slow operation. The fewer times we have to do this, the faster our application will be. Especially when doing things like searching and sorting.

The mantra "using fewer memory, results in a faster application" is very true in our case.

What is much harder, there's also a (significant?) *cost* to scan for duplicate strings, to reconstruct them and so on, when showing a folder.
With a hashtable-based implementation you don't need to scan for duplicates. You just add items using the string as key. Hashtables are fast at this because they make a hash of the string-key. These are integer comparisons. Onces all items are in the Hashtable, you just walk the table writing down each string, updating the index about the location (that's the "affected" list in the test).

Another reason why this will be a lot faster is because the size of each mapped file will be 1000 items. The current mapped file simply contains all items. Rewriting a folder with 50,000 items means that we have to do 50,000 x 4 string writes. With this summary storage we will only rarely have to uniq and sort a file with 1000 x 4 strings, a sequential index and a sequential flags file (writing the sequential ones is too fast to even mention, really).

Ironically we also write a smaller mapped file because we have fewer strings to write (we made them unique): writing is slower than searching for duplicates. And that was a test on a fast harddisk, not even on a ultra slow flashdisk. Although, indeed, a tablet's memory access will be slower than a PC's memory access too (it's therefore not a guarantee, but we'll test this of course).

I tested this with 100,000 items (with 50% duplicates, this is very much like the Linux mailing list's folder): this was actually faster than writing 50,000 items using the current summary format.

To me at least it's not obvious that this optimization will not actually
slow things down.
That's why I'm making this test. To test that :)

So, before rolling this out, I'd recommend you to do some numbers of the
difference with/without string clustering (this should include showing
the items).
Of course.

Thanks for being concerned. I hope my long reply clarified certain things.

I fear I only added confusion, though. Important is the difference between VmSize and VmRSS.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]