Re: outgoing mail encoding preference



Hi Johan!

Am 17.03.05 02:54 schrieb(en) Johan Braennlund:
Hi. I don't know whether the balsa list accepts e-mail from
non-subscribers so I'm sending this to you personally. Feel free to
forward to the list if you feel like it.

The list does accept from subscribers only. I'll send a copy of my reply to the list.

> as 7bit would collide with all national chars,

What do you mean?

*All* chars which are not in the US-ASCII set (e.g. äöüáô²³€ etc. etc.) are either single-byte > 127 (ISO-8859-x sets, [spit] Winbloze CP something, etc.) or two bytes of which the first one is > 127 (unicode). Thus restricting text parts to 7-bit makes sending (remember that we're only talking about sending messages here) such parts with national chars impossible.

> and 8bit is not safe for every MTA.

In this day and age I think you'd be hard pressed to find a reasonably
recent MTA that is not 8-bit clean.

While this is true, any good application should be designed for maximum compatibility. Speaking in RFC terms this means that as long as they state we SHOULD implement something (and RFC 2821 and 2822 say we SHOULD stick to US-ASCII and we SHOULD not send lines longer than 78 chars etc. etc.), we *will* implement it to ensure compatibility. This is the recommended "good practice" by the IETF.

Remember that using 8-bit may or may not work for some recipients. I agree that most of them will receive proper messages, but some may get garbled crap, and some MTA's may even refuse to accept such messages (as they obviously violate the RFC's) at all.

A couple of points:

1. I find it very convenient to be able to grep through my (sent)
e-mail.

To be honest, I don't see why this should be a show-stopper... Balsa has nice built-in searching and filtering capabilities to do exactly this. BTW, I don't think that the grep method will work reliably, as (a) headers MUST NOT be 8-bit (see below), and as 8-bit bodies may be encoded in some ISO charset or in utf-8 (unicode). For example, I usually use iso-8859-15 (west european w/€), but sometimes I write messages with mixed west and east-european chars, where utf-8 is the only choice. I don't know about *any* (e)grep implementation which can handle that.

Please remember also that as long as you don't look at national chars (which may have different encodings as explained above) the pure US-ASCII stuff will *not* be touched by the qp encoding, with the exception of some (trailing) spaces and over-long lines. However, the latter is again a restriction of RFC 2821/22.

If that were no longer easily possible with Balsa, I would start
looking for another MUA.

As I said above, this feature is already built into balsa... Not with the *full* power of egrep, though.

Or would the encoding only take place for the mail that's handed off to the MTA and the local copy would still be in whatever format it was to begin with?

The local copy contains the same data as passed to the MTA, i.e. it's qp encoded, encrypted etc., whatever you selected upon sending.

2. Does the encoding apply to 8-bit characters in headers too? If so, I
think you're getting into a big mess. It's common to end up with
headers having things like "=?ISO-8859-1?" stuffed in them, since some
MUA's don't decode the headers properly.

Headers MUST NOT contain any 8-bit chars according to RFC 2047. The encoding above is the only way to go. Any MUA which does not decode them properly is simply broken or *very* old (RFC 2045 and friends were released in Dec. 1996).

While I think 8-bit headers are technically a violation of some RFC, they tend to work better than MIME.

Sorry, but IMHO this is completely wrong! Using 8-bit chars in headers is a clear violation of RFC 2047 and therefore an absolute show stopper. Balsa (and any other non-broken MUA) will NEVER use anything else than this encoding. The big mess would start if Balsa breaks the rules. Full stop.

It is true that some MUA's still *try* to interpret such broken headers. Please keep in mind that the result is completely unpredictable, though. What if the 8-bit header chars come from iso-8859-7 (greek), but are displayed as -15 (west european w/ €)? You get a picture why the clever dicks introduced this standard almost ten years ago...

3. When checking some of the e-mail servers through which I regularly
receive mail, they add things like "X-MIME-Autoconverted: from
quoted-printable to 8bit". This is admittedly a small sample, but it
seems like all you'd accomplish for such servers is to add a couple of
encoding-decoding cycles.

The "X-" indicates that this is something non-standard. MTA's are supposed to pass the content "as is" without *any* modification, with the exception of adding "Received:" headers (and probabely the exception of virus scanners, but here we *expect* the contents to be modified if necessary). So I think this MTA is simply broken from the RFC pov, if it *really* performs such a conversion on text parts.

Please note that automatically converting bodies from QP to 8-bit will with the utmost probability break *all* signed messages! RFC 3156 (PGP/MIME) explicitly requests text parts in signed messages to be QP encoded. RFC 2633 (S/MIME) states that they should be either QP or base64 (which would make them completely unreadable for non MIME-aware MUA's) encoded. As the Message Integrity Check hash must be calculated over the QP encoded data, the signature will *always* be broken, thus making crypto unusable.

> Otherwise I would
> prepare a patch to remove this option completely from the project.
>
> Opinions?

Please don't. I wouldn't object to base64 being made the default if
that's not already the case but I'd really like to still have the 8-bit
option.

Well, the patch is already in the CVS since some time... Base64 is the default for any non-text part (and not a serious option for text, btw., see above).

I am of course open for a new discussion about whether we should roll back the patch (which is in the end the decision of the maintainers - Pawel? Peter?).

My very personal opinion is that the *only* interesting argument for keeping 8-bit is the grep problem, but, as I said before, Balsa provides a search feature, so I am not convinced that this is the only way to go. <ot>We might want to extend the search tool to accept regular expressions in the future?</ot>

Just my € 0.01...

Cheers, Albrecht.


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Albrecht Dreß  -  Johanna-Kirchner-Straße 13  -  D-53123 Bonn (Germany)
       Phone (+49) 228 6199571  -  mailto:albrecht dress arcor de
   GnuPG public key:  http://home.arcor.de/dralbrecht.dress/pubkey.asc
_________________________________________________________________________

Attachment: pgpLoS0zVhLP0.pgp
Description: PGP signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]