RE: [Evolution] bad separation



(Warning: This message is going to sound harsh. Please don't take it
personally.)

Sejal Patel wrote (with many snippages):

Searching can be reduced simply because you don't have to
do long complicated instructions to parse out the mbox
in comparison to a well designed XML

Just because you're using libxml doesn't mean you're not paying the CPU time
to parse the data. Remember, the *common* case will be to load and unload
things from disk. There's no way in hell I can afford enough memory to
maintain my email in a fully parsed XML tree in memory. And there's also no
reason.

In these cases the best design decision you can make is to make sure the
on-disk format and the in-memory format are the same. That way, you pay no
overhead for leaving the data on-disk other than the swap time.

This is actually somewhat faster because it loads up the
stuff into an XML tree and can be traversed fairly quickly.  Even
searching for a string in the body would be easier because you would not
have to go through the headers each time looking for the body and stuff
before searching through the body.

This is what an index is for, and that's why Evolution generates one.

I'm thinking that using XML will allow you to easily expand on the
things that you can do and allow you to modularize the code much more.

Repeat after me: XML is not a magic bullet. Why do you want Yet Another Mail
Format? The mbox and mh/Maildir styles, while not perfect, are common. Your
proposal places burden on the email client to manage an internal (XML) and
external (mbox) format simultaneously, without any gain. (Remember, many
people will be running Evolution on systems that already have local mail
delivery set up. Not everyone does pure POP/IMAP, where an email client
would have the luxury of inventing whatever storage format it wants. Some of
us run fetchmail and procmail.)

Unless you can come up with a concrete example of how XML would benefit
email storage, whatever gains you may see are outweighed by the rather
tangible losses.

But looking for the first blank line after the headers and then figuring
out which blanks lines are part of this body and which ones are part of
the next body might not sound that complex but you are WASTING several CPU
instructions to do this which seems inefficient to me since there are
better more effecient ways of doing this such as XML.

No. It's only faster if you have XML in a parsed tree in memory. And that's
not going to happen for my 1/2 a gig of email. And I'd lay odds that libxml
is spending quite a few more CPU cycles parsing documents than Evolution is
searching for "\n\nFrom ".

This is a
thought that I've been working on and I find that it is especially
advantages when you have large mailboxes (like 100+ messages) and is also
very useful for contacts and calendaring.

Hmm, my definition of a large mailbox and yours are slightly different.
'Large' to me means 50,000 messages.

Ray
--
rblee impulse net  ~  ray madrabbit org





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]