Re: [Evolution-hackers] Moving the struct instance heap space to mmap




Hi Guys,

I have this naive idea that phones of tomorrow will have much more
Internet related functionality than today's modern desktops. In a much
more integrated way. I believe people hate desktops but enjoy the
applications on top of it.

Maybe.  It's just a tool, and for some people, being able to leave it in its place is just as important as it is for others to have it grafted to their right hand. 

Like how iPods are designed for playing music, car navigation designed
for bringing you to your destination, DVD players for ..., cameras
for ..., etc

If that naive idea happens, there will be a huge demand for mail clients
that can show 100K+ messages yet not use a lot memory.

Sure?  Most people manage 100-1000 messages at most, and just delete the rest.  I would suggest that is even normal behaviour - you've gotta have something wrong with you if you've got 600 folders and millions of messages in your personal mail account - thats what archives, newsgroups, discussion boards are for.  Especially if you're going to try and manage it on your mobile phone.

I suppose it depends on what problem you're solving - remote access to lots of mail, with a lightweight client, or complete local access to mail, or some combination thereof.  There's some overlap, but not a huge amount.  e.g. why not just use hula for remote mail.  Evolution was obviously the heavyweight, local model.  Works very well for laptop users, who just lug it around everywhere (which was pretty well everyone on the evolution team).  But now I find gmail does me - having to be connected can be a drawback, but i can easily and quickly access it anywhere.
 
My reason is the technical challenge and the enjoyment of trying to make
a difference. The path to it, not the result. It might make it more
difficult for me to view things as wasted effort (but I'm also young and
naive).

 
Perfect is anti-perfect because then it lacks the possibility of
improvement.

Naah, then you just change requirements if you want to 'improve' something.

It's fun and fuelling to watch people use it. That is true.

But its decidedly unfun and draining to put up with their complaints!

> Having extra indices on disk - you're really just writing datbase
> table - and all of the associated complexity, consistency and
> coherency issues that implies.


I wouldn't be against using one of those 'embedded' database engines
like sqlite3*, mysql-embedded or db* and ask for a cursor at the model
impl of a treeview.

* sqlite3_prepare(), sqlite3_step(), sqlite3_reset() etc etc

I fear copying the entire query result in the memory of the application
(how most developers use databases in their apps) wouldn't reduce memory
consumption at all. Launching a full new SQL query each time a row
becomes visible, is going to cost a lot in performance. Since sorting
would be done at the db, it might indeed be 'fast enough'. On a mobile
device, screens are small. Amounts of visible headers too. 20 queries
isn't a lot.

Well the "disksummary" code used libdb to step through indices.  It only loads messageinfo's that are in use, once they are unreffed they get unloaded from memory.  e.g. you can iterate the whole folder to perform some operation, and only the current item is necessarily in memory.

The only real problem is the interfaces require you to list everything in a folder(*), but that was needed for the thread view algorithm and the treeview widget anyway (particularly for sorting), so there's no 'extra' overhead in that model.  But the interfaces don't support fetching sub-rows, e.g. to drive a display.  But to do that they'd have to do their own sorting and threading, etc.

(*) of course, it also supports 'views', cached and atomically maintained sub-queries of a given folder, so you can also just list those messages which match the view.

> Maybe gtktreeview changed since i last looked at it.  It was rather
> slow too, iirc.  There's only so much virtualisation you can do with
> tree's anyway, some of the data you need all of the other data to
> calculate - unless you keep another on-disk index in 'tree sort'
> order, and virtualise even that.

Sorting is indeed still a pain. Showing once the model is sorted isn't
anymore with the current gtktreeview and fixed-row height turned on.

The cursor idea and a db engine might be a quick solution for this.
Databases are often very good at sorting stuff.

Depends on the db engine a bit.  libdb cursors let you do this sort of thing, but then things get a bit messy if you're updating while they're active, and the like - iirc anyway.  Particularly in a threaded environment.

What you really want to do is create a view which matches the current display crieteria (e.g. folder + search + sort - search is vital), and then be able to access that by rows.  But then you have to somehow do threading too, which a normal db won't do, and embedded db's probably wont do views either.  But all this disk-based, transaction-safe stuff is much slower then just loading it into memory ...

I'm experienced enough to realize that ;). I didn't start tinymail as a
specialist in E-mail. Which probably explains why I'm doing so many
refactor iterations. But redesigning is imo okay. I just called it
"Agile" as explanation why redesigning and rethinking is important. Not
being afraid of changing things is in my opinion important.

OO  requires you to do a lot of refactoring - its impossible to get it right first time.  Either you do it at the design stage or the coding stage, but you're always going to have to do it, at least 2-3 times.

It's a problem the Evolution project is, I think, experiencing. But I'm
powerless when it comes to Evolution. I look at Harish and technical
Novell decision makers and Novell management to solve this.

They've got different priorities anyway, its a commercial product that needs to care more about existing customers than doing anything interesting ...



Hi Till - i'll just drop the reply in the same message.

> Having attempted various ways of implementing on-disk and in-memory
> indexing, the last and current being mmap'd binary on-disk structures
> (with a lovely collection of data integrity, robustness and performance
> issues, (NFS, anyone?)), and having long battled huge memory footprints
> in our applications, we (the KDEPIM developers) have decided that a
> database approach to the indexing (and the caching) problem is probably
> our best bet.

Doing disk-based things is just hard, or i always found it that way - having
to worry about every fail case and recovering, performance, etc.  Thats
what middle-ware is for, the hard bits :)

> I've talked to a few of the evolution hackers in Bangalore about this at
> foss.in/05, when it was still a wild idea in the back of my mind, but
> it has since come to a state where the initial implementation is
> delivering stuff and most of the main pieces are in place. This would
> be an ideal time for any interested party to have a closer look, find

If it's just using IMAP, how is it different from say, HULA or some other
imap server?

> In short, this discussion reminds me a lot of similar ones we had over
> in K-town which eventually led to Akonadi, so I couldn't resist to
> pitch it to you a bit. My apologies for hijacking the thread. If you
> have more concrete questions as to the kinds of problems we are facing
> with memory mapped index files, feel free to ask in PM or on
> evolution-hackers, which I'm subscribed to.

Yeah we had these discussions before when I was still working there.
Always hamstrung by the need to support a lumbering application, and
even some conflicting ideas, e.g. there's no reason the addressbook
data couldn't have just been a camelfolder with a different item type,
but, well, it was already written as a corba service.

I'm not sure i'd go with an IMAP layer myself, but I suppose it
depends on what problem you're solving.  I guess as far as protocols
go it isn't really that complex at the heart of it.  Most of the
problems with it are dealing with in-compatabilities and
unknown-in-advance extension support and bugs.

Well, for example, I was looking at this IDL for 'message' services:

http://users.on.net/~notzed/src/Evolution-DataServer-Mail.idl

e.g. look at Folder at the bottom.  That basically does everything you
can do with CamelFolder, but is an awful lot simpler, and much easier
to implement server-side.  i.e. supports searching, retrieving, and
modifying meta-data on messages.  It could even support non-mail
message types.

Passing back a stream for the message content lets you do things
like retrieving ranges (pipelined) from an imap service, so you
can still multiplex the connection, so you don't end up hogging it
for large messages in a multi-threaded environment.  In a much easier
way than partitally retreiving compound objects in the way
camel-imap does.  Sure, you can't 'skip this attachment' - but we don't
anyway, since you either convert it to an icon, or snoop its content
in the client most of the time (see Philip, complexity isn't always
warranted even if it seems the right choice on the face of it :).

(I'm not suggesting this IDL is anything other than an idea I had
a long time ago.  It is also too simple if you wanted to do remote
sorting/threading, and the like).

 Michael



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]