GMime improvements / mailbox backends / PGP



Okay, so a while ago I had mentioned GMime on this list and Pavel had
asked whether or not GMime took care of charset issues.

That day has come. (actually, it came a few weeks ago)

I have created a g_mime_init() function that takes a flags argument.
Currently there is only one flag: GMIME_INIT_FLAG_UTF8 which tells gmime
to use UTF-8 interfaces, meaning that for example when you request the
Subject of a message, it is given to you in UTF-8 rather than as
decoded-but-unknown-charset format.

If you initialize gmime to use UTF-8 interfaces, it will choose the best
charset for encoding a message header using g_mime_charset_best().

There are also plenty of other charset utilities...


There's also been discussion on this list, although it hasn't seemed too
urgent, to write a new mailbox backend.

I seem to recall the new mailbox backend discussion saying that MIME
should not be handled in the Folder abstraction due to the slowness of
parsing MIME. Well...

I have recently been working on porting GMime to glib2 and GObject and
thus had to slightly restructure my classes. Doing this forced me to
write a brand spanking new parser. Since I now had to write a new one, I
also cleaned it up so that I didn't have to break abstractions. Yay,
code cleanliness! Anyways, I'll talk more about GMime2 later...

Why did I mention all that? Well, I guess it gave me an excuse to write
a new parser.

One of my goals was also speed. So far, I had 2 good parsers in the
gmime-1 branch. gmime-parser.c was whickedly fast but required loading
the entire message stream into ram before parsing. This was a pretty
nasty hack. pan-mime-parser.c had been written with the goal of keeping
all message/mime-part content on disk so as to keep from using up
massive amounts of ram once the message was parsed. Pan had needed this
feature because some users were downloading iso images and so on.

In fact, pan-mime-parser.c is the whole reason why GMime uses streams
(which can be file streams or memory streams or anything you need, btw).
I think you Balsa programmers will immensly enjoy GMimeStreams ;-)

Anyways, pan-mime-parser.c was unfortunately pretty slow in comparison
to gmime-parser.c. One of my hacks had been to implement a
GMimeStreamBuffer class which would buffer another stream (ie,
read-ahead buffering). This sped things up quite a bit, but
pan-mime-parser.c was still around 4x slower than gmime-parser.c for the
same message. Yikes!

Now, about my new parser... the new parser is actually *faster* than the
original gmime-parser.c implementation, but, like pan-mime-parser.c does
not have to load the message into ram before parsing. It instead parses
incrementally off disk. Instead of forcing our caller to pass us a
buffered-stream for performance, we instead keep our own read-ahead
buffer of 4k (easy to change if you need to, it's just a #define).

Keeping our own read-ahead buffer actually helped increase performance a
LOT.

Okay, so how fast is it?

One of my test messages that I created for purposes of profiling is 38
MB. The new mime parser can parse that in 0.612s. For comparison
purposes, it takes 0.540s to read that file @ 4k/loop and shoving it
into a GByteArray.

That means that parsing MIME with my new parser is nearly as fast as
reading it off disk!

This means that there is NO EXCUSE for not handling MIME at the Folder
abstraction for those that are working on a new mailbox backend for
Balsa.

Also, for your convenience - I am adding interfaces to GMime2's parser
to allow it to parse mbox streams. I've already got it recording the
offset of each header so that one could request, say, the Status:
header's offset and set a new value on it without having to rewrite the
mbox file.

I have also backported GMime2's parser to the gmime-1 branch.
Unfortunately, so that I don't break gmime-1's API too badly (and partly
because I'd have to implement another object class for GMimeParser), I
have not backported the mbox stuff, but I think that if you guys were to
work on writing a new mailbox backend, it should be targeted for the
gtk2 port of Balsa anyway. So that shouldn't matter.


Now, about PGP. I know chbm knows about this, but for those that don't -
I have a branch of GMime called GMIME_PGP_MIME that is kept in sync with
the gmime-1 branch but has the added niceties of some PGP/MIME
utilities. Some of which may also be useful in writing in-line PGP since
it has a simple wrapper around pgp2, pgp5, pgp6 and gpg.



If you guys take a look at the gmime-1 branch and have any requests,
please make them. I will try to please. I also plan on making a 0.9.0
release fairly soon, although I'll wait a good week or more for any
requests/suggestions that you guys may have. Ideally I would like 0.9.0
to contain the last API changes before a final 1.0.0 release. I want to
keep 0.9.0->1.0.0 to be nothing but bug fixes (assuming there are any -
I'm sure there will be).

Btw, same goes for cvs HEAD of GMime... which is the GMime2 port. I
especially want feedback on this branch since I've already broken API a
bit. Just so you don't get worried, porting from gmime-1 to gmime-2 is
not that difficult.


Comments/Suggestions welcome,

Jeff





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]