RE: [Evolution] bad separation



On Wed, 14 Jun 2000, Ray Lee wrote:

Just because you're using libxml doesn't mean you're not paying the CPU time
to parse the data. Remember, the *common* case will be to load and unload
things from disk. There's no way in hell I can afford enough memory to
maintain my email in a fully parsed XML tree in memory. And there's also no
reason.

The Principle of Locality greatly reduces the penalty caused by this.  Now
while the 386, 486, and the earlier 586's will suffer greatly from this
<dodges rotten tomatoes being thrown> the mainstream CPU's will not suffer
nearly as much as you think because of the way hardware handles the
caching.

In these cases the best design decision you can make is to make sure the
on-disk format and the in-memory format are the same. That way, you pay no
overhead for leaving the data on-disk other than the swap time.

I agree that it is best to have a in-memory format as an on-disk format
but from what I can tell, they already do have the structs setup so that
the in-memory could resemble and XML on-disk format.

This is actually somewhat faster because it loads up the
stuff into an XML tree and can be traversed fairly quickly.  Even
searching for a string in the body would be easier because you would not
have to go through the headers each time looking for the body and stuff
before searching through the body.

This is what an index is for, and that's why Evolution generates one.

ok.


I'm thinking that using XML will allow you to easily expand on the
things that you can do and allow you to modularize the code much more.

Repeat after me: XML is not a magic bullet. Why do you want Yet Another Mail
Format? The mbox and mh/Maildir styles, while not perfect, are common. Your
proposal places burden on the email client to manage an internal (XML) and
external (mbox) format simultaneously, without any gain. (Remember, many
people will be running Evolution on systems that already have local mail
delivery set up. Not everyone does pure POP/IMAP, where an email client
would have the luxury of inventing whatever storage format it wants. Some of
us run fetchmail and procmail.)

XML is not a magic bullet.  Quick question though, does your copy of
evolution actually maintain the external mbox.  I ask because on my copy
it simply removes everything from my mbox and appends it to it's own mbox
thing.  Since it was removing the mbox anyway, I was just curious as to
why the format they were keeping had to be mbox and not something a little
better because as you said, it isn't perfect.  If it is actually
maintaining the origianl mbox thing then I agree that using XML would
actually greatly increase the complexity of the software needlessly.  I
also use fetchmail and procmail so I definately would not want to do
anything that would damage that style of mail reading.

Unless you can come up with a concrete example of how XML would benefit
email storage, whatever gains you may see are outweighed by the rather
tangible losses.

I was throwing out an idea because I haven't seen for sure all the
benefits and downsides of this idea.  I've been goofing around with sample
coding of doing this and I really found it to be remarkable easier to do
stuff with it and it still be fast.

No. It's only faster if you have XML in a parsed tree in memory. And that's
not going to happen for my 1/2 a gig of email. And I'd lay odds that libxml
is spending quite a few more CPU cycles parsing documents than Evolution is
searching for "\n\nFrom ".

Well the question is whether or not it is a half gig of big messages or
lots of little ones.  The former it would be faster but the later would
definately be a problem I guess.  Also, I'm quite tempted to do some
actual speed comparisons between the "\n\nForm " along with the special
condition checks you occasionally have to do to the messages in comparison
to libxml but I'd guess that they would be relatively the same.  Just a
guess mind you.

This is a
thought that I've been working on and I find that it is especially
advantages when you have large mailboxes (like 100+ messages) and is also
very useful for contacts and calendaring.

Hmm, my definition of a large mailbox and yours are slightly different.
'Large' to me means 50,000 messages.

Well, I usually delete the messages after I read them so my definition of
large mailboxes is significantly smaller then yours.  Thus I can see your
point about maintaining all that in memory but I still don't quite see why
it has to be that XML is slower then mbox for handling this quantity of
messages.

-- 
Sejal Patel - CS 1312 STA     "I can call spirits from the vasty deep.
   sejal cc gatech edu        Why so can I, or so can any man; but will
    Georgia Institute         they come when you do call for them?"
      of Technology           Shakespeare, King Henry IV, Part I





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]