Moving the struct instance heap space to mmap



Hi there hackers from all over the world,

I have an idea to move the struct instance heap size of the CamelMes-
sageInfoBase instances in TnyCamelHeader to an mmap. It does happen that
I code my ideas into implementations. This is likely going to become
such an idea. The problem is that it has been in my head for too long :)

The problem with these struct allocations is that they consume around
120 bytes each. Even with the mmap patch, I had to for each summary item
(which is a lot like a header, as most people receiving this mail
already know) malloc a sizeof (CamelMessageInfoBase), assign the
pointers (from, subject, to) to locations in the mmap memory area and
etcetera (the mmap patch is indeed extremely simple, like most concepts
and ideas in programming).

The *new*/*extra* idea is to create a second index file which contains
the offsets to the pointers in the camel summary file. Then mmap also
that file. "Extra" because the idea will build on top of the existing
and working mmap idea (it's not yet in Evolution HEAD, but that's not
because the patch isn't working. It is working -- I've been running it
for two months without a single crash or regression --).

I'm not yet planning to put this in a Camel that will also work with
Evolution. My test will indeed be tinymail. Evolution might not be
designed for this. Probably it will be possible, after some adaptations,
to get Evolution running with this new idea.

Changing Evolution once the idea is implemented would indeed reduce a
big amount of memory (feel free to calculate how much you would reduce
it, you can easily make a theoretical calculation of this. And it's
indeed as significant as your calculation will tell you, if you have a
lot messages being maintained by Evolution).

Add to those theoretical calculations the heap-admin needed per malloc
call on a specific size (might be somewhat reduced by gslice-like
magazine allocators) and you might get an idea why Evolution is eating
your memory .. today.


The reality of both this new/extra idea and the mmap patch is that the
memory is not really reduced HOWEVER BUT the memory will be mmaped by
the kernel. This means that the kernel will swap-it-in when you need it
and swap-it-out when you don't need it. Yet the application developer
behind Evolution doesn't have to care about this. Cheap! But effective.

For example "sorting": Because the qsort algorithm doesn't take large
steps (it's a recursive algorithm that compares near values in a list),
a kernel that mmaps using four kilobytes pages will not have to swap-in
often when this happens. The swap-ins will mostly be sequential. Also
note that the very nature of a mailbox is that it's ALMOST sorted the
way most people want it viewed already IF you append or prepend new
messages (don't insert them at a random location). Riddle me this: for
example the mbox format is append, the summary format of Evolution is
also append. No problem here.

The summary file of camel will be generated by the camel_folder_refresh
method, which still consumes way too much memory. I would need to
redesign Camel for that not to consume as much memory. I'm waiting for
the crazy fuckers that are going to implement the disksummary ideas (I
repeat .. with me, once you ghost guys start with it, because I will
join and put some of my energy in this).


My *new*/*extra* idea: (between [these] means 32 bits and 0 here = '\0')

[istart] = mmap ("~/fs/index")   | [sstart] = mmap ("~/fs/summary")
---------------------------------+-----------------------------------
[from_o][sub_o][to_o][flgs]  ... |  ...
[from_o][sub_o][to_o][flgs]  ... |  ...
[from_o][sub_o][to_o][flgs]- + sstart -> [0x00 0x00 0x00 0x04]
|        |        `--------- + sstart -> [Piet][ <p ][pi c][om>0]
|        `------------------ + sstart -> [Yeah][ yea][h000]
`--------------------------- + sstart -> [Hans][ <h ][ha c][om>0]
[from_o][sub_o][to_o][flgs]- + sstart -> [0x00 0x00 0x00 0x03]
|        |        `--------- + sstart -> [Hans][ <h ][ha c][om>0]
|        `------------------ + sstart -> [Oehh][ooee][h000]
`--------------------------- + sstart -> [Piet][ <p ][pi c][om>0]
        ...                           ...


To read message n, you would simply do something like:

from = sstart + *(istart + (sizeof (int) * 4 * n) + 1)
subject = sstart + *(istart + (sizeof (int) * 4 * n) + 2)
to = sstart + *(istart + (sizeof (int) * 4 * n) + 3)
flags = sstart + *(istart + (sizeof (int) * 4 * n) + 4)

Maybe I made a little mistake in calculating the offsets here, maybe I
need to think some more about the pointer arithmetics before posting
bullshit. But .. anyway, that's the idea.

The idea is indeed to rewrite the index file as often as needed, and to
keep the summary file as static as possible. For example local message
flags (like "Important" and "Read or Unread") go in that index file and
setting them will rewrite the entire index file followed by a mremap (or
munmap and mmap on platforms that don't support mremap).

When messages get added, the plan is to mremap both the summary and
index file BUT ONLY after a bunch of them are received AND at the end of
the camel_folder_refresh (or whatever internal thing in Camel). Because
the defines don't store pointers, this will indeed work if only the
"nth" variables in the structs are correct (cool, eh .. this solves the
current problem of the mmap patch Jeffrey, the one about new messages).

My plan is to make sure that both istart and sstart are written with
four bytes aligned information (strings are padded with NULLs to align
on the fourth byte). In istart there will only be 32bit integers, which
makes it aligned on four bytes.


So:

typedef struct {
void *sstart, *istart;
} CamelFolderSummary;

typedef struct {
  char *from, *subject, *to; 
  int flags;
} MemoryMessageInfo;

typedef struct {
CamelFolderSummary *fs;
int nth;
        MemoryMessageInfo *m;
} CamelMessageInfo;


#define camel_message_info_from (x)                               \
    ((x)->m ? (x)->m->from : (x)->fs->sstart+*((x)->fs->istart    \
     +(sizeof(int)*4*(x)->nth)+1))
#define camel_message_info_subject (x)                            \
    ((x)->m ? (x)->m->subject : (x)->fs->sstart+*((x)->fs->istart \
     +(sizeof(int)*4*(x)->nth)+2))
#define camel_message_info_from (x)                               \
    ((x)->m ? (x)->m->to : (x)->fs->sstart+*((x)->fs->istart      \
     +(sizeof(int)*4*(x)->nth)+3))


CamelFolderSummary*
camel_folder_summary_new (char *index_file, char *summary_file)
{
  CamelFolderSummary *fs = malloc (sizeof *i);

  fs->istart = mmap (0, length, PROT_READ, 
MAP_SHARED, open (index_file, ...));

  fs->istart = mmap (0, length, PROT_READ, 
MAP_SHARED, open (summary_file, ...));

  return fs;
}

CamelMessageInfo*
camel_message_info_get (CamelFolderSummary *fs, int nth)
{
CamelMessageInfo *i = malloc (sizeof *i);
i->fs = fs;
i->nth = nth;
i->m = NULL;
}


-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]