Re: [Evolution] bad separation



On 14 Jun 2000, Michael Poole wrote:

Can you elaborate on these?  XML really doesn't get you anything over
mbox format besides increased parse time.  XML also has a long track
record of dealing very poorly with MIME (in terms of how much data you
need to read and escape or un-escape, and check for literals).  In
particular, I think you're mistaken for these reasons:

An increased parse time is without a doubt going to be there just by the
general nature of building an XML tree.  One of the major things I was
thinking about is having an XML structure to serpate out messages,
seperate the headers and body in the messages, seperating out the field
types in the headers, and perhaps even seperating the signature out of the
body in the body type (I've been playing around with a rough draft of
doing this just to see the feasability and found it really
powerful).  Now, because these are broken up into a simplier formating
style (I've also been playing with making everything XML including
contacts and calendar and todo list) it was a lot easier to code up
interaction between all 3 things.  Searching can be reduced simply because
you don't have to do long complicated instructions to parse out the mbox
in comparison to a well designed XML thing because in normal searches and
filters you are looking ONLY for subjects or from or date or things like
that.  They are a breeze to do because the XML has already broken it down
into these and the XML parsers out there are very effecient at finding
these.

Search time doesn't depend as much on disk format as how you store
things in memory and how you index them.  In fact, you could even
argue that it's harder and slower to search an XML tree for a given
string than a flat file.  (Especially once it's been loaded into
memory.)

I realise this and I didn't mean to imply that the disk format mattered
but the compatibility between things and the fact that it is much easier
to test and debug XML parsing plus they are already done
effeciently.  This is actually somewhat faster because it loads up the
stuff into an XML tree and can be traversed fairly quickly.  Even
searching for a string in the body would be easier because you would not
have to go through the headers each time looking for the body and stuff
before searching through the body.  Bad explanation I know but I'm hoping
you understand the concept behind it.

Search criteria also don't depend on serialization format as much as
how you examine the contents of the mail.  Using XML won't gain you
anything, since you still need to write code to grok the things you're
indexing.

I'm thinking that using XML will allow you to easily expand on the
things that you can do and allow you to modularize the code much more.

Parsing of separate messages is already pretty easy with an mbox.
Read lines; until the first blank line, you're reading headers.  Skip
the blank line.  Read more lines; until you read another blank line
followed by a line starting with "From ", you're reading the message
body.  MIME part support is harder, but it needs to be added somewhere
regardless.

But looking for the first blank line after the headers and then figuring
out which blanks lines are part of this body and which ones are part of
the next body might not sound that complex but you are WASTING several CPU
instructions to do this which seems inefficient to me since there are
better more effecient ways of doing this such as XML.

As for the amount of code, XML is Very Hard to get right.  That's why
libxml is popular -- because you'd have to be crazy to write your own
XML parser for a small (<10KLine) application.  But if you compare the
amount of work you need to do for an XML-based reader and compare it
to the small additional effort for a mail-oriented encoding, you'll
find that the extra effort for mbox is pretty small compared to the
time you (and other people) will spend waiting for libxml (or the XML
mail code) to escape and un-escape special characters.  And that is
even IF you find a good XML storage format -- you can very easily lose
any of the advantages you list if you use a poor schema.

I wasn't talking about writing your own XML parsing.  In fact, I'm using
libxml to do the random XML parsing stuff that I am doing.  If I'm not
mistaken, evolution is already using libxml in their code.  This is a
thought that I've been working on and I find that it is especially
advantages when you have large mailboxes (like 100+ messages) and is also
very useful for contacts and calendaring.  It wasn't that much better for
the todo's but might as well keep it consistant.

-- 
Sejal Patel - CS 1312 STA     "I can call spirits from the vasty deep.
   sejal cc gatech edu        Why so can I, or so can any man; but will
    Georgia Institute         they come when you do call for them?"
      of Technology           Shakespeare, King Henry IV, Part I





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]