Re: [Evolution-hackers] Camel mmap summary ideas, proposal for a meeting



On Wed, 2006-07-12 at 13:34 -0400, Jeffrey Stedfast wrote:

> There's a couple problems that I can think of that will need to be
> solved in order for mmap to work (at least for Evolution, altho at least
> a few of the problems also apply to tnymail)

> 1. (Evolution only?) fd usage will easily max out on systems like
> Solaris and/or installations where the user has a large number of
> folders. Each mmap'd file requires a persistantly open file descriptor,
> which, especially combined with vFolders, will be difficult to keep
> under the system threshold.

Agree. Each folder keeps one filedescriptor open. On systems that have a
lot Evolution instances with a lot users that have a lot folders, this
is not going to be nice.

However. A kernel that scales can probably easily handle a few thousand
open file descriptors. And the drawback of closing them (like what the
current implementation of Camel does) is that you need to put all the
content in malloc()-s. Which consumes a lot more memory resources than
an open file descriptor in the kernel would.

> tnymail solves this by never allowing more than a single folder to be
> opened, but Evolution can't enforce that quite as easily. Even with
> the ::open()/::close() idea I proposed a few years ago (or hacking
> Evolution to try and do what tnymail does even), vFolders will still be
> problematic because they keep all of their source folders opened
> (mmap'd) so that the summary info is available.

Indeed

> 2. As pvanhoof has discovered, when new messages arrive in an opened
> folder, the summary info is just added to the array as malloc'd objects
> (well, structs) and are not immediately written to disc.

> While, yes, the simple solution is to do what pvanhoof proposes: write
> those entries to disc and then re-mmap the file - this is kinda gross
> and as Varadhan has pointed out, is not as efficient because we'd have
> to reparse the entire summary file again.

Which however isn't as expensive as the current implementation. If you
posix_madvise POSIX_MADV_WILLNEED, the kernel will probably in a very
efficient way copy the content of the file from the filesystem into
memory (and will most likely not page it out until you advise the kernel
to use a less gross mapping). I don't know this for sure, I haven't
checked the Linux kernel implementation (nor the Solaris one) for mmap.

> We also would have the following problems:

>   a. remapping the file also means that all string pointers currently in
> use by the application become invalid.

Correct. This is a real problem indeed. 

>   b. to fix this, you'd need to emit a folder_changed signal letting the
> application know that all of the original objects were removed and
> replaced by a new set meaning that the message-list would be required to
> re-build each time anything changed which is less than ideal. (okay,
> yes, I believe it currently mostly does this already because of some
> problems with incrememntally adding stuff to ETable but it means that
> this problem can NEVER be fixed if we go the mmap route). It would also
> mean that those MessageInfo objects wouldn't persist through
> folder_changed events which would suck.

Sounds like some refactoring/rethinking work. But imo not unsolvable.
Looking at the code in evolution/mail is going to be a good idea anyway.

Components that expect strings to be there, while the strings weren't
intended to be in memory for ever anyway, implement a bug imo. If they
wanted the string to be there until they decide not to use it anymore,
the object would need reference counting or they should copy the
information.

Afaics doesn't the CamelMessageInfo type have reference counting. So ..

ps. This is why I create proxy types like TnyMsgHeader in tinymail. They
are a proxy for CamelMessageInfo. They have reference counting. There's
also an implementation so that it will play proxy for a normal
CamelMimeMessage. This I had to do because the POP provider of Camel
doesn't implement summaries.

So in tinymail, when using POP, it will use full CamelMimeMessage
instances. This isn't a good idea, I know. I still have to fix this. But
it's also a proof that it would work if ... ;)

>   c. you'd also need to sync any changes to flags/tags to disc before
> you started appending the new message info's so that you don't lose any
> data (this isn't really a problem, just listing it here so people don't
> overlook this).

ok

> It also seems to me that if we were really going to be serious about
> using mmap as a real solution (and not just a hack), we'd have to
> redesign the summary files to group all the string data together to try
> and keep strings in contiguous pages to keep page swapping to a minimum.
> The current file layout is terribly inefficient for this.

I very much agree with this. The file format can be much improved
indeed. Compressing (not in the context of ZIP compressing, but by
avoiding redundant information) can for example be implemented in such a
way that the file gets stored without any redundant information.

-- Oh by the way. There's a huge amount of "Re: " strings in my memory
at this moment. We can probably improve that too ;-) --

This hack was from the beginning about making a point (that mmap could
be used, to get myself convinced about that funny idea I had, etc).

For tinymail it's definitely very useful already. For Evolution it
works, but it does indeed have some issues (mostly listed by Fejj in
this E-mail). Like what Microsoft once told us: Where do you want to go?

ps. I'm definitely going to use this mmap implementation for some Camel
packages that I'll be building for certain mobile devices. Without it,
it's not really possible to use tinymail with folders larger than 14,000
headers. My promise with tinymail is that I'm going to support such
amounts on devices like the Nokia 770. So I will.

-- 
Philip Van Hoof, software developer at x-tend 
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
work: vanhoof at x-tend dot be 
http://www.pvanhoof.be - http://www.x-tend.be




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]