Re: [Evolution] title encoding

From: Not Zed <notzed ximian com>
To: Xavier Bestel <n0made free fr>
Cc: evolution lists ximian com
Subject: Re: [Evolution] title encoding
Date: Wed, 16 Jun 2004 18:12:22 +0800

On Wed, 2004-06-16 at 09:13 +0200, Xavier Bestel wrote:

Hi,

I often receive mails with non-ascii titles, and sometimes Evo doesn't
display them right. For example, this subject (in the mail source):

Subject:
        =?iso-8859-1?q?www.amen.fr_-_Actualit=E9_Technique_:_Surcharge_des_serveurs_de_messagerie_=
        /_Disponibilit=E9_du_service?=

displays in the mail view as:

                           Subject: 
=?iso-8859-1?q?www.amen.fr_-_Actualit=E9_Technique_:_Surcharge_des_serveurs_de_messagerie_= /_Disponibilit=E9_du_service?=


For some reason, it seems the subject isn't decoded. Can anyone spot if
there's a problem with the syntax, or if it's an Evo bug ? (in which
case I'll do a proper bugreport).

The syntax is wrong. It looks like the mailer that sent it tried to do content quoted-printable line wrapping, which is totally incorrect in this context!

The string can't contain any spaces, including linear white space. The whole token is supposed to be less than 76 characters for this purpose, so that any folding (wrapping on multiple lines) is outside the token boundaries.

>From rfc2047, the relevent rfc, it is pretty clear and quite explicit about how the headers should be interpreted. Subject is a text field.

6.1. Recognition of 'encoded-word's in message headers

   A mail reader must parse the message and body part headers according
   to the rules in RFC 822 to correctly recognize 'encoded-word's.

   'encoded-word's are to be recognized as follows:

   (1) Any message or body part header field defined as '*text', or any
       user-defined header field, should be parsed as follows: Beginning
       at the start of the field-body and immediately following each
       occurrence of 'linear-white-space', each sequence of up to 75
       printable characters (not containing any 'linear-white-space')
       should be examined to see if it is an 'encoded-word' according to
       the syntax rules in section 2.  Any other sequence of printable
       characters should be treated as ordinary ASCII text.

We more or less do this exactly, but we relax it slightly - we do no length checking.

And while i'm at it:


6.3. Mail reader handling of incorrectly formed 'encoded-word's

   It is possible that an 'encoded-word' that is legal according to the
   syntax defined in section 2, is incorrectly formed according to the
   rules for the encoding being used.   For example:

   (1) An 'encoded-word' which contains characters which are not legal
       for a particular encoding (for example, a "-" in the "B"
       encoding, or a SPACE or HTAB in either the "B" or "Q" encoding),
       is incorrectly formed.

   (2) Any 'encoded-word' which encodes a non-integral number of
       characters or octets is incorrectly formed.

   A mail reader need not attempt to display the text associated with an
   'encoded-word' that is incorrectly formed.  However, a mail reader
   MUST NOT prevent the display or handling of a message because an
   'encoded-word' is incorrectly formed.

Which again, is exactly what evolution does.

Michael Zucchi <notzed ximian com>

Ximian Evolution and Free Software Developer

Follow-Ups:
- Re: [Evolution] title encoding
  - From: Xavier Bestel
- [OT] why no line breaks? Was:Re: [Evolution] title encoding
  - From: Jens Ansorg

References:
- [Evolution] title encoding
  - From: Xavier Bestel

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]