Re: [Evolution-hackers] compression for folders?



On Tue, 2004-05-11 at 13:35, Todd T. Fries wrote:
> Trust me, I know what you object to when I say clearly:  I am not
> proposing we compress mbox'es and call it a day.
>
> I agree this is an objectionable mechanism, too io and cpu intensive,
> etc.

<snip text regarding compression of just message bodies>

I disagree with this completely. The compression step is the same
whether you do just the message bodies or the entire message with
headers as well. For most messages, the vast majority of time is going
to be spent in overhead invoking the (de-)compression mechanism. An
extra 1k of plaintext isn't going to make a blip on the radar at that
point.

> This would allow seek'ing to be equal to that of
> mbox'es and/or Maildir etc, but permit the biggest part of large emails
> to be compressed.

If this is what you want, then compress each message individually as the
gzip -9 < message >> mbox.gz example, then maintain a frickin' index of
the restart boundaries in another file.

The benefits of this are immense. You get to inherit two known file
formats (.gz and mbox), which have oodles of code and tools for dealing
with them on *nix systems, and you can still seek into the file to an
individual message in constant time.

Whether this is implemented as a new back-end is a separate topic.

(And, for the record, most of the attachments I get are already
compressed. So, the biggest uncompressed part of messages 90% of the
time, for me, is the headers.)

As a later enhancement, allow multiple messages to be compressed and
appended at once, and have a notation in the index to let the code know
about that.

> Personally, it makes sense to me to utilize something like the .zip
> format, where you can have an index to the different chunks of data, and
> can seek to that offset and decompress only what you need to, instead of
> the whole folder.

You can do this with .gz if you maintain an external index. If you lose
the index, it isn't difficult to recreate it, either.

What am I missing here?

Ray




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]