Re: [Evolution] bad separation



Sejal Patel <sejal iname com> writes:

So I'm curious about why evolution email is not being stored in XML style
format instead of the mbox style it seems to be using.  Storing the in XML
could allow for a much improved search time, more versitile search
criterias, easier parsing of seperate messages, and could have a lot of
potential with a lot less code then would the current style of mail
storage you'll are using.

Can you elaborate on these?  XML really doesn't get you anything over
mbox format besides increased parse time.  XML also has a long track
record of dealing very poorly with MIME (in terms of how much data you
need to read and escape or un-escape, and check for literals).  In
particular, I think you're mistaken for these reasons:

Search time doesn't depend as much on disk format as how you store
things in memory and how you index them.  In fact, you could even
argue that it's harder and slower to search an XML tree for a given
string than a flat file.  (Especially once it's been loaded into
memory.)

Search criteria also don't depend on serialization format as much as
how you examine the contents of the mail.  Using XML won't gain you
anything, since you still need to write code to grok the things you're
indexing.

Parsing of separate messages is already pretty easy with an mbox.
Read lines; until the first blank line, you're reading headers.  Skip
the blank line.  Read more lines; until you read another blank line
followed by a line starting with "From ", you're reading the message
body.  MIME part support is harder, but it needs to be added somewhere
regardless.

As for the amount of code, XML is Very Hard to get right.  That's why
libxml is popular -- because you'd have to be crazy to write your own
XML parser for a small (<10KLine) application.  But if you compare the
amount of work you need to do for an XML-based reader and compare it
to the small additional effort for a mail-oriented encoding, you'll
find that the extra effort for mbox is pretty small compared to the
time you (and other people) will spend waiting for libxml (or the XML
mail code) to escape and un-escape special characters.  And that is
even IF you find a good XML storage format -- you can very easily lose
any of the advantages you list if you use a poor schema.

Michael




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]