Re: [Evolution-hackers] Reviewing imap_update_summary



On Sun, 2006-10-22 at 15:41 +0200, Philip Van Hoof wrote:
> Greetings,
> 
> imap_update_summary is implemented in three or four steps:
> 
>   o. Getting (all) the headers/uids
>   o. Finding the ones that we must still fetch
>   o. Fetching those (x)
>   o. Writing out the summary
> 
> The steps each consume memory and reuse some of the memory of the
> previous step. Pointers to that memory is stored in GPtrArray's like
> fetch_data and messages.
> 
> In the code I have found no real reason to why this was done in
> separated loops (steps) rather than one step and at the end of the loop,
> free the data already. Especially for the third step (x), which seem to
> consume most memory while it's happening.

I rewrote this behavior in camel-GroupWise to fetch data in iterations,
so that the memory requirement remains a constant k instead of O(n), n
being the number of messages. I expected it work better. (GW and IMAP
code are similar in this aspect)

However, when I tested it, as expected, the memory requirement came down
but the number of disk-access increased and hence it became slow. So I
reverted it to the old behavior. 

You can try rewriting this in IMAP and you will find out that the time
taken to complete the sync will increase in case of large folders.


> 
> The current implementation requires the data being received from the
> IMAP service to be kept in memory until the entire folder has been
> received and all steps done. This consumes more than one entire kilobyte
> per message. Multiply that with for example 5,000 headers and you'll get
> 5 MB memory consumption for fetching the new messages of a very small
> IMAP folder (in case no other messages had been received before you
> first started the procedure).
> 
> Multiply that with 50,000 headers and you'll get 50 - 60 MB memory
> consumption for a not extremely big but relatively big IMAP folders.
> 
> Which will be freed, yes, but nevertheless it's a slowly growing peak
> (the speed depends on the connection with your IMAP server) that only
> gets deallocated or when pressing cancel or when all messages are
> received (which can take a significant amount of time).

I tested the changes that I made (in camel-GW) and found that in actual
deployment scenario, people prefer having an occasional memory-shootup
(which will come down as soon as mail-fetch is complete) rather than
having so many disk-access, that will eventually make operations longer
to complete.


> 
> The strange part is that if I measure the amount of bytes that I receive
> from the IMAP service; I measure far less bytes being transferred than
> bytes being consumed in memory. It not only stores all the received
> data, it also stores a lot more in memory (probably mostly 4 bytes
> pointers and GData stuff).
> 
> I wonder whether there was a reason why it was implemented this way? If
> not, I'm planning to rewrite imap_update_summary in a different way. For
> example by immediately creating a CamelMessageInfo struct or burning it
> to the summary file instantly and freeing the GData created from the
> stream in-loop.

If you want the memory usage to be a constant value, the best solution
is to make the folder_sync function fetch summaries iteratively from a
database back-end (not from flat-files or mmap). Perhaps this should be
done at a higher (camel) level so that all the providers can make use of
it; Means rewriting parts of the camel folder, summary etc.

Anyway, all this is already covered under "On-Disk-Summaries". If you
are so obsessed about memory usage, go ahead and give it a try. 

Some times, the customer's needs are different from what the hacker may
perceive to be the most important thing. Evolution's periodic memory
shootup (and subsequent coming down) is considered to be normal by the
customers than things like Proxy-authentication-failure (libsoup), EDS
crashes etc, that we have been working on.

It is an interesting work but we (the Evolution team) have got other
priorities driven based on actual customer deployment needs. These are
the most important things that Evolution (and indirectly Camel also)
should address to become a stable enterprise-level GNOME app. 

Sankar





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]