Re: [Evolution-hackers] compression for folders?



On Tue, 2004-05-11 at 10:28 -0500, Todd T. Fries wrote:
I wish for everyone involved in this discussion to read my comments on
bug 23621.  It's obvious from some of the comments that it has not been
read.  I see no point in re-creating it here, as I was told that is the
proper medium for appending constructive advice like comments.
I'm not online at the moment so I can't, but in general bugs are good for tracking features and summarising (and for bugs), it isn't very convenient for general discussion, particularly in preliminary design phases or for nutting out details - which is precisely what this list is for.
That said, please understand:

- compression could be made (in my mind) completely transparent if a new
folder that had compression in mind were given life
- compression could be optional (you don't use it until you opt to
convert a folder to a compressed format folder)
So youre saying this could apply to any arbitrary folder?  Any local arbitrary folder?

The main reason i'm against this is it complicates the code somewhat.  A rather somewhat.  Each backend is pretty specialised in what it does.  It operates on a specific storage format.  This is by design and on purpose, it means each backend can be relatively simple, and only needs to abstract the api into that storage format.  Adding extra layers ontop of some folders will complicate development and maintenance.
- archiving is separate from compression
Well, yes and no.  Why do you want to compress folders?  Only because you have a shitload of mail, that you basically never delete.  That sounds a whole lot like an online archive, to me.  You can play with words and semantics, but at the end of the day, there's little difference.

Lets just drop the archive bit and call it a separate compressed backend then?  Archiving would mostly be a function of the frontend anyway.  But having a known efficient backend storage for it, would make it easier to write in the frontend.
- yes, it's fast to append to gzip and even bzip2 data streams, and yes
it takes some cpu time to recrunch them; for this reason in my proposed
new folder type I suggested grouping messages to allow for a fair
tradeoff between too big of an mbox as a single gzip stream vs every
message compressed individually, both of which have obvious
objectionable qualities (time vs space, respectively)
If you're using a different mailbox format, then you need another backend, end of story.

Maybe we're just talking about different things here.  Backend (where the code actually gets done to do the work) vs frontend functionality, like selecting an alternate mailbox format.

In 1.5.x we no longer have the option to modify the mailbox format, although at some point in the future this may be doable - for all local folders though, not on a per-folder basis.  But certainly, if you had a compressed-option backend, then you could have it per-folder based on its functionality.

On the other hand, with the way backends are plugged in - it makes really little difference.  If you dont use the stuff under "On this computer", then no mail goes there (apart from the outgoing spool).

Alll you do is close that tree down, and use your compressed-local backend.

Whats the big deal?
- one could even allow for a background thread or a manually invoked
thread that recompresses things in the background for a tighter fit;
access time doesn't suffer, quick writes don't suffer, but recompressing
can reclaim more diskspace especially if one opts to allow the
recompressing program to attempt multiple algorithms to determine the
tightest packing algorithm for a given dataset
Sure, there's a ton of things you can do.  I just don't want you doing it in the mbox code.  The mbox code is for writing to mbox's.  Once you compress it, especially if it isn't just a single stream, then you're no longer a berkely mailbox.
Hopefully this will make it clear that, in my mind, short of manpower,
the concepts of compression could be done in such a way that would not
be objectionable to anyone.
The reason i suggest doing it as a separate backend are manyfold:
- makes little real difference to the user.  Users cope with IMAP pretty easily; it would show up the same way.
- most users don't need this for normal working folders, i would argue nobody does in that case.
- it doesn't belong in the others.  Especially since it would presumably be a different storage format entirely, and not even merely a compression of otherwise identical objects.
- its a backend.  It has to go in the backends.  Backends aren't the frontend, and can be/are hidden from the user anyway.
- it can be developed in parallel, independently.  No objection to it going into the main CVS (i would encourage it - infact it could probably fit in the local provider, but it has to be a different type, not a layer ontop), but it needn't even do that.  This also lowers the risk, adding major new features to an existing backend in which people have gigabytes of 'mission critical' email, isn't low risk.

This last point is reinforced by the other facts:
- the api's aren't that simple and there's a lot of stuff to learn and to implement
- almost none of the existing code in a given backend will be re-usable as soon as you change the storage format.  Thats all they're for after all, abstracing the storage format.

Again, maybe we're just misunderstanding where you're talking about (again i apologise for not being able to read the bug report - i was busy this week fixing bugs, and am offline for a few hours).

The strongest point is really the parallel development angle.  You can provide all the functionality without interfereing with any of the core in any way.  I mean you could potentially just take the mailbox one, and develop a compressed one in parallel.  Once it is stable, then merge the code, or not, as appropriate.  Since its all abstracted anyway, you'll have to do this all anyway.

Michael

Michael Zucchi <notzed ximian com>

Ximian Evolution and Free Software Developer


Novell, Inc.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]