Re: [Evolution] bad separation

From: Sejal Patel <sejal iname com>
To: Michael Poole <poole troilus org>
Cc: evolution helixcode com
Subject: Re: [Evolution] bad separation
Date: Wed, 14 Jun 2000 17:05:23 -0400 (EDT)

On 14 Jun 2000, Michael Poole wrote:

An increased parse time is without a doubt going to be there just by the
general nature of building an XML tree.  One of the major things I was
thinking about is having an XML structure to serpate out messages,
seperate the headers and body in the messages, seperating out the field
types in the headers, and perhaps even seperating the signature out of the
body in the body type (I've been playing around with a rough draft of
doing this just to see the feasability and found it really
powerful).


It's really powerful until you separate too much and treat part of
somebody's message as a signature..


The sepratation of message body and signature is a potential.  I didn't
say that these were absolute seperations.  I'm just throwing out ideas
here so that we can all come decide as a whole the best way to do this.

Because what are broken up into a "simplier formating style"?  You
don't need to keep a flat representation of the mbox file in memory
even if you use the (standard!) mbox format on disk.


So why is it so important (other the direct compatibility without
exporting) to have a flat representation as an mbox.  This whole idea
started off was a way of making a more maintainable and managable system
then the mbox.  Also, just because it is standard and has been used for
several years (something like 22 years I think) does not mean it is the
absolute best way of doing things.  It just means that it is an accepted
way of doing things (I know I've set myself up here for a huge bashing but
I didn't mean it in a bad way).

This goes back to the indexing I mentioned: in this, XML doesn't give
you anything that having an index per header field does not give you.
In fact, if you rely on the XML parsers' search functions for this,
you will miss valid aliases.  For example, mail sent to
"john(likes)@(to)annoy . you" would go to the same place as mail sent
to "Johnny Annoyance <john annoy you>", and relying on an XML search
fails to capture that.


To tell you the truth, I didn't even know that you could have
"john(likes)@(to)annoy . you" as an address field and it magically resolve
down to "Johnny Annoyance <john annoy you>".  Thus it was a bit difficult
for me to see that this was a problem.  However, I'm still not a 100% sure
of what you're talking about here so there may or may not be a clean way
of doing this in XML.  Would you mind explaining that a bit more.

No matter how fast your XML parser or traverse is, a full search will
lose by having to chase pointers around your XML tree.  In addition,
just because the disk format is one way does not mean the in-memory
format must be the same.  I've written an IMAP server that does
indexing and searches on text and headers.  Its mail storage format is
mbox (plus extra index files).  I'd be willing to bet it has much
higher performance for searches than something that uses XML
internally.


The general nature of having an XML tree (at least all the good ones I've
seen) is that it is a tree and that in a worse case scenario you have a
O(log N) search time when "chasing" pointers around everywhere.  Of course
your IMAP server is going to be using mbox over XML right now because the
idea of using XML for mail storage is not exactly old news.  XML was not
even around when IMAP was.  I would doubt that it has a much higher
performance than a quality internal XML system.

It's not hard to devise an architecture that gives you the same
modularity for searching and processing mbox-stored files.  I don't
think you need much more modularity than per-header and per-MIME-type
operations.


So what you're saying is that the mere idea of using XML is complete
ludicracy even if would end up making things simplier.  If you keep adding
little bits and pieces to twist one thing to give it the ability to do
another, it doesn't mean that it is better then something that is designed
to be able and handle that in the first place.

You have to do that lookahead one (1) time with mbox.  With XML you
need to special-case &, <, and > everywhere they occur in your code
for reading.  Random access (say, by a message number in the store)
is also harder if you're using an XML tree to represent the file.


No you don't.  What are you talking about.  You don't have to do that look
ahead ever.  Also, random access is not harder at all.  I fear that you do
not quite understand what I am refering to when I suggested the XML 
routine.

I think it's a fair assumption that any Unix mail reader must be able
to read mbox files (since many, if not most, Unix mail users have that
as their primary mail spool).  In my opinion, unless there are serious
flaws in that format, there's not much reason to switch to another
format.  (And yes, reader/writer conflicts being able to hose an mbox
store is a good reason to use accessories like file locking with
standard mbox.)


I never said that you take away evolutions ability to read mboxes.  Heck,
I'm using mboxes and fetchmail right now and still will reguardless of
whatever happens.  The thing that I'm saying is that since evolution is
already storing all the messages it is receiving why not store them in an
XML style format and if they want it would be a simple matter to export
the thing to mbox or .eml or Outlooks format (I'm assuming that outlook
has their own format but I don't really know for sure).

I guess I still haven't given you an arguement for why it is better but I
also fear that it is because you're not completely understanding what I'm
saying or that I'm completely misunderstanding what you are saying.  You
haven't told me anything that says it is bad and are making lots of
statements like "if you don't do it right ..." and "if you do this to the
mbox ..."  I'm not trying to be an ass here, I'm just trying to figure out
why XML would totally blow as you seem to believe.

-- 
Sejal Patel - CS 1312 STA     "I can call spirits from the vasty deep.
   sejal cc gatech edu        Why so can I, or so can any man; but will
    Georgia Institute         they come when you do call for them?"
      of Technology           Shakespeare, King Henry IV, Part I

Follow-Ups:
- Re: [Evolution] bad separation
  - From: Michael Poole

References:
- Re: [Evolution] bad separation
  - From: Michael Poole

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]