Re: [Tracker] index evolution mail is broken

From: Laurent Aguerreche <laurent aguerreche free fr>
To: Philip Van Hoof <spam pvanhoof be>
Cc: Jerry Tan <Jerry Tan Sun COM>, tracker-list gnome org
Subject: Re: [Tracker] index evolution mail is broken
Date: Sun, 30 Nov 2008 23:29:39 +0100

Le dimanche 30 novembre 2008 Ã 22:47 +0100, Philip Van Hoof a Ãcrit :

On Sun, 2008-11-30 at 19:13 +0100, Laurent Aguerreche wrote:

Le vendredi 28 novembre 2008 Ã 18:19 +0100, Philip Van Hoof a Ãcrit :

On Fri, 2008-11-28 at 22:12 +0800, Jerry Tan wrote:

Evolution 2.24 has migrate to use sqlite to store mail's summary.

So the parser which is based on parsing Summaryãfile is broken.


What I do not understand here is why Evo hackers have only replaced
summary files without including e-mail contents...


Because storing the content of E-mail in a database, for example as a
BLOB, makes relatively few sense (as in: no sense whatsoever). Storing
the metadata about the content of E-mail in a database does make some
sense, though.

A lot of E-mail servers (like Dovecot, Cyrus, Isode's MBox, etc) store
E-mail content in files too. An IMAP server is actually just a database
with a frontend that happens to talk a specific RFC. Yet these groups of
IMAP server developers still don't store things in relative databases.


For an E-mail client:

IMHO Ideal would be to store E-mail messages as directories, with each
MIME part of the E-mail being a separate file within that directory, and
all other data stored in a database (headers could be stored as triples
in a RDF triple store - like 3store -).

rename() on the folder-name can be used for the flags, just like Maildir
does to make the format easy to reuse and backup, and instead of having
to parse the entire message, and having to download the entire message,
you can store individual parts-of-interest as individual files. Reducing
format complexity. Having to MIME-parse Maildir and nearly all of the
other local formats is among the reasons why I dislike most of the local
formats.

If a client wants to reduce disk-space, it can remove attachments of
E-mails that are cached locally and available remotely (like with IMAP)
easily: the client would just have to unlink() a bunch of files in a
bunch of directories.

Meanwhile you want to store the BODYSTRUCT and the ENVELOPE of E-mails
and of MIME parts that are stored inside of E-mails (RFC822 forwarded
messages) in a database or in an easily accessible format.

The reason for that is that ENVELOPE and BODYSTRUCT are precisely what a
piece of software like Tracker needs to 'index' E-mail folders.

You might also want to base64-decode the MIME parts before storing, as
this makes it more easy for indexers to index/scan the files (which
encoding depends on the e-mail, you don't want to needlessly require
indexer softwares to become as complex as having to understand your
format and how to find out in what encoding MIME-part files are encoded,
you instead want to store the original data as-is). This also makes
sense from the POV of how you read (and store) XMP/exif/etc data: if
encoded or compressed you need to use more memory before you can read
(or write) that data. While harddisk-space is near to unlimited nowa-
days, I/O access speed ain't.

A better idea would be that if we'd instead of trying to parse
Evolution's file ourselves, make a Evolution plugin that over IPC,
shmem, pipe() or whatever pumps its data over to us.


The last time I read Beagle's code, I found out it was also trying to
parse Evo's internal caches. Abstract access to Evo's caches is a good
idea but it can't be just a Tracker plugin, it has to be something used
by any program that wants to access Evo's data (and Evolution itself).


Sure

It seems that Evo hackers are trying to replace Bonobo code in flavour
of DBus, would it be possible to us to also use DBus there or is it only
Evo internal things? See:
http://mail.gnome.org/archives/evolution-hackers/2008-November/msg00009.html


The remote IPC (CORBA and now increasingly DBus) APIs that Evolution
exposes have nothing to do with its E-mail functions. Only with
calendaring, contact and TODO lists.

Evolution Data Server is not, what I call, "E-mail as a service".
Everything about E-mail, in Evolution, happens in-process of the
"evolution" process (that's the UI, the shell if you prefer that name). 

It's just a marketing stunt that "camel" is included in the EDS
(Evolution Data Server) package. The "camel" library is technically not
part of EDS. The Evolution shell dynamically links with it, and runs its
code completely in-process of itself. That's unlike the services that
provide the Calendar, Tasks, Todo and Contact data. Which are provided
by the 'actual' Evolution Data Server.

Evolution Data Server "does not" serve E-mail. Don't let anybody tell
you that, because it doesn't. (it's also not really a secret, just a
misconception that a lot of people seem to have about Evolution).

Camel is indeed the library that Evolution uses for its E-mail
abstraction, but Camel is a normal shared library that runs in-process,
not a service that gets communicated with from another process (like the
shell).

Even worse. You can't use camel on top of the same "cache dir", as that
will make Camel write the same summary files, and the same E-mail
content cache files, in parallel by both processes consuming the Camel
library.

Which would result in data corruption. There are also no fcntl or flock
locks placed on the files in question. Camel's design probably wouldn't
cope with such locks either, unless you rewrite quite a bit of it first,
or if you invent a recursive file-lock-ish thing.

Evolution's use of Camel would probably make the UI hang each time Camel
would hit such a flock() lock, as not everything in Camel happens in a
thread created by Evolution.

UIs that 'hang' (don't get redrawn) for some seconds, because another
process is doing something, ain't the kind of behaviour people like in
an E-mail client.


Thank you for this very interesting answer!

If I understand your message, Evo would need to not use Camel directly
anymore but an "e-mail service" instead which would be able to deal with
concurrent read and write. More precisely, this service should allow Evo
to read or write anything ASAP while a "spy program" (like Tracker)
would be able to read data without blocking Evo accesses...

It seems to me that this a task is for Evo hackers and that it will take
a long time to implement. Any opinions?


Laurent.

Right now trying to parse Evolution's hideous file formats is quite
crazy, and each time they change their format we will have to fix our
code too.

It's also "not correct" to read Evolution's internal cache files.
Evolution is not designed to either cope with another process tampering
with its caches nor will it care about the other process, at all.

If you guys at Sun want to join to fun, such a Evolution plugin would be
an excellent contribution indeed. Perhaps also one for Thunderbird and
some other E-mail clients ... and we can safely enter 2009!

Attachment: signature.asc
Description: Ceci est une partie de message =?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=

References:
- [Tracker] index evolution mail is broken
  - From: Jerry Tan
- Re: [Tracker] index evolution mail is broken
  - From: Philip Van Hoof
- Re: [Tracker] index evolution mail is broken
  - From: Laurent Aguerreche
- Re: [Tracker] index evolution mail is broken
  - From: Philip Van Hoof

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]