Re: Beagle roadmap.



On Tue, 2004-10-05 at 11:48 -0400, Nat Friedman wrote:
On Tue, 2004-10-05 at 11:45 -0400, Dan Winship wrote:

[ Snip the bulk of the argument that we should make Camel lock indexes
and just use Camel instead of indexing stuff ourselves. ]

> Wouldn't you rather not have to care about all this and just have a
> nice API?

Yes, absolutely.  We sort of had the impression that this wasn't going
to be doable, because it would require adding tons of locks to Camel.

Maybe an Evo mail person can provide an idea as to how we would go about
fixing Camel up?

Well there's not a lot of context here ... there's been a few other pretty random mails asking inspecific questions, so i'm not sure how you came to your conclusions.  I don't know what beagle is, it wouldn't work/or install on a couple of machines i tried.

I presume you want parallel accessible indexes or something.  There are a ton of problems with this if you don't go through a single management process (as in unix process), this is why dbms's exist to manage database indices afterall.  Thats why EDS exists for calendar and contact data.  Thats even why gconf exists to badly manage registry data.

  1. Its harder to implement.  I don't really know how much harder since I never considered it.  We may just get away with a single filesystem lock on the index and a slight restructure of the internal caching of the code; but then you run into portability and filesystem issues (if you use fcntl locks, e.g. on nfs).  And then you have issues with interrupting some access while something else is busy.
  2. You get lots of data consistency issues - i don't mean internal structure consistency, i mean you do a search for 'foo', and then foo's record gets updated after you did/while you were doing the search and you don't know your data is stale anymore, for example.  There's no real usable/scalable mechanism to handle notifications of data changes.
  3. libdb has lots of tools to do this sort of thing (parallel access, transactions, persistent queues), but everyone hates it since they don't seem to understand it, and, well, it is very very slow compared to evolution's pretty-slow indexing already.  Orders of magnitude slower and much much bulkier.
  4. If you want to talk to IMAP/whatever, you can't open multiple connections on some servers anyway, and then dealing with IMAP's private data from multiple processes is another problem entirely.
  5. Memory overheads, copying stuff around etc.

I also have no idea if you mean body-indexing or 'message summary' indexing, which are almost entirely different issues each with their own complications.  And just accessing either directly gives you no useful access to remote mail anyway, which i presume is important.

The way to access mail data from everwhere would be to put it into EDS or similar server.  I presume that is the ultimate goal.

But to do that in a scalable and useful way from inside the mailer would require some fairly significant architectural changes.  e.g. EDS would manage all of the filters/vfolders/all of the accounts/etc  You'd have to wrap all those api's somehow.  The message-list view would ideally be driven from a 'remote view' managed inside EDS but displayed in Evolution (or whomever else wanted one).  This would require an entirely different approach to the treeview (since the widget would only have indirect access to the data).  Otherwise you'd need to copy scads of data around which doesn't scale (like the calendar/addressbook don't scale), mail already has scaling issues.  Message content isn't really that big a problem, but message-list data is since you can easily have hundreds of thousands of individual message 'information items' flying around when a user changes the folder they're looking at which you can't effectively cache if you have to load them all to do a sort or thread-view calculation anyway.

--
Michael Zucchi <notzed ximian com>
"born to die, live to work, it's all downhill from here"
Novell's Evolution and Free Software Developer


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]