Re: [Evolution-hackers] compression for folders?



On Tue, 2004-05-11 at 08:28, Todd T. Fries wrote:
> I wish for everyone involved in this discussion to read my comments on
> bug 23621.

There's nothing all that surprising in that bug #/feature request. Well,
the creation of a new mailbox format was a bit surprising, but that was
about it.

> - compression could be made (in my mind) completely transparent

Well, of course.

> if a new folder that had compression in mind were given life

Hmm. This is an implementation level discussion point. How about we back
off those until the rest is sorted out.

> - compression could be optional (you don't use it until you opt to
> convert a folder to a compressed format folder)

Again, of course.

> - archiving is separate from compression

Yes, but. Once you have the concept of archives worked into the product,
having them compressed is one of those obvious features that goes along
with it. Taking it a step further, once you have archives, that implies
that the *rest* of your email isn't archives, yes? And if it isn't
archives, then it is, effectively, active. And if it's active, it's
probably getting flags rewritten for status changes or whatnot.

Archives are an obvious thing to target for compression. Live mail
spools aren't. Doesn't mean it wouldn't eventually prove to be useful,
I'm just saying that it's not nearly as obvious.

> - yes, it's fast to append to gzip and even bzip2 data streams, and yes
> it takes some cpu time to recrunch them; for this reason in my proposed
> new folder type I suggested grouping messages to allow for a fair
> tradeoff between too big of an mbox as a single gzip stream vs every
> message compressed individually,

Well, of course. Chances are evolution wouldn't archive every time a new
message came in, right? It'd probably do it once your local mail files
passed some threshold (size or time based), in which case there would be
a chunk of mail streaming towards the archives all at once. Implying
that it'd all get compressed at once.

But again, this is one of those implementation detail things.

> recompressing can reclaim more diskspace especially if one opts to allow
> the recompressing program to attempt multiple algorithms to determine
> the tightest packing algorithm for a given dataset

This is easy to say, but makes for a much more difficult implementation.
For marginal space savings (over gzip'd in the first place), I'd have a
hard time believing any cost-benefit analysis would show that this would
be worth the effort of coding, debugging, and maintaining.

Remember, perfect is the enemy of the good.

Doing some numbers, a sample week of the linux kernel mailing list (that
I picked at random) is about 4.3meg. gzip -9 takes it down to 1.1meg,
25% of its original size. bzip2 -9 takes it to 0.88meg, or about 20% of
its original size.

Is 5% noticeable? Sure. Is it worth it? That's a harder question to
answer.

> Hopefully this will make it clear that, in my mind, short of manpower,

I don't think anyone is *really* dead-set against the concept
compression. But if it's going to be done, it has to be approached in a
way that can be done incrementally (labor-wise and feature-wise), so
that it doesn't bog down the development team, the release schedule,
etc. Just getting the functional and UI details of a (non-compressing!)
archive system worked out is a decent chunk of work. Once that's done,
compression could be introduced there.

And from that point, compression could be evaluated (in an incremental
fashion) for use in the rest of the system.

I guess my point is that I (a random bloke on the list) think it's a
good idea, but it's also a rather large amount of work to do it right.

Ray




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]