[evolution-patches] Re: [Evolution-hackers] mmap() for the summary file



A couple possible problems with this approach:

1. systems that don't have mmap or the mmap is broken (I can't think of
any off the top of my head... except maybe Win32? Tor?)

2. NFS... how well would this work over NFS? What are the performance
implications?

3. It will keep a lot of fd's open... EMFILE anyone? :(

Jeff

On Sun, 2006-06-11 at 16:05 +0200, Philip Van Hoof wrote:
> Hi there,
> 
> I've been trying to replace the fread()/fopen() implementation in
> camel-folder-summary.c with an mmap() one.
> 
> I know camel-file-utils.c will put duplicate strings in a hashtable and
> that way reduce memory usage for the summary information. Because a lot
> mail boxes have duplicate strings for the From and To headers. I know
> why and how this is implemented. And I understand that this already
> reduces memory usage a lot.
> 
> However. On a small device with few memory resources, the kernel knows
> better when to allocate and when to deallocate uncachable data like this
> summary information.
> 
> Therefore I propose to replace the implementation with mmap(). Not only
> I propose it, I also already tried it myself.
> 
> While trying this, I came to the conclusion that it *would* be possible
> if the strings would have been terminated by '\0' in stead of being
> stored pascal-like using an encoded unsigned 32 bit integer in front of
> the string data.
> 
> That decision makes this (using the current file format) impossible,
> unless the mmap'd memory (and therefore also the file on disk) is
> constantly rewritten (with '\0' characters) or unless the entire
> infrastructure that uses the summary strings is adapted to use this
> length information rather than using the strings directly from the
> mmap'ed memory *as* NULL terminated C strings (char arrays with a NULL
> termination). The second solution implies that all would have to be
> converted to GString's.
> 
> I think it would reduce memory usage of Evolution with ~40mb (depending
> on the total amount of summary information being loaded). It would make
> the sorting of the header summary view a little bit slower on certain
> machines (mainly on machines that have very few memory resources left,
> so that the kernel will not put a lot of this mmap'ed data in its
> buffers/cache).
> 
> The file format should be adapted in two ways:
> 
> - Duplicate strings will need to be stored at only one location *on the
> disk*. So the hashtable implementation wouldn't be a memory-only but
> also a in-the-summary-file something.
> 
> For example: A string-field can be a pointer to the first character of
> the string, or a pointer to another location in the file (in the mmap).
> 
> - Strings will need to be '\0' terminated *in the file* so that they are
> directly usable from the mmap() memory block. 
> 
> 
> Who are the brave souls that want to join me with this brain-damaging
> idea? And would a change like this (which would mean that a migration
> procedure each time an old folder-summary is loaded would need to run)
> ever get upstream?
> 
> I measured (using valgrind) that most of the Evolution memory usages
> goes to storing a in-memory version of the summary files. I also
> measured that there's quite a lot memory segmentation going on (while
> loading the summary file) and that it (the memory for the file) consumes
> ~ twice as much as the on-filesystem filesize of the summary file.
> 
> Loading using mmap() would be faster and wouldn't consume as much real
> memory (it would consume a mmap, that is true, and that memory would
> most likely go to the buffers/cache which the kernel manages, that is
> also true). Sorting might become a little bit slower (but probably not
> noticeable on most desktop hardware).
> 
> I'm being serious, I would like to waste my time on this one. If the
> camel team of Evolution likes the idea (and wouldn't mind wasting some
> of their time on it as well). If not ... I'd rather wait for the
> disk-summary branch or for libspruce than to waste my time with it.
> Because forking camel would mean wasting huge amounts of time on
> maintaining a fork.
> 
> I attached a patch with my current tryout. I already load the header of
> the summary file using mmap. That is already working. The difficult part
> is, however, making the strings themselves usable. Because those aren't
> NULL terminated. But please check the patch, you'll immediately see what
> I mean.
> 
> Copying the strings, and NULL terminating the copy, is not a good option
> because that would make the entire mmap-concept pointless (you still
> copy it to real memory, so the entire reason-for-mmap then gone). Note
> that this is what the current implementation also does: it copies the
> string and null terminates the copy. And then frees the malloc that was
> allocated for reading it from the file.
> 
> In fact is that copy unnecessary. Since fread() is a copy (and not like
> mmap also real on-disk data), it wouldn't matter if you'd use the
> original malloc()'d memory. This memory copying is probably causing the
> memory segmentation I mentioned above. If you'd implement it like this,
> you'd better at least used gslice.
> 
> But anyway ;)
> 
> 
> _______________________________________________
> Evolution-hackers mailing list
> Evolution-hackers gnome org
> http://mail.gnome.org/mailman/listinfo/evolution-hackers
-- 
Jeffrey Stedfast
Evolution Hacker - Novell, Inc.
fejj novell com  - www.novell.com




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]