Re: [gmime-devel] Decoding UTF-8 headers

From: Jeffrey Stedfast <fejj gnome org>
To: Andrew Victor <avictor za gmail com>
Cc: gmime-devel-list gnome org
Subject: Re: [gmime-devel] Decoding UTF-8 headers
Date: Sat, 16 Jul 2011 17:00:34 -0400

On 07/16/2011 04:01 PM, Andrew Victor wrote:

[snip]

Ok.

Is it not also an issue that 1 byte of a 2-byte UTF-8 "character"
(=C3=AD) is in one encoded-word, and the other byte in the next
encoded-word (next line)?

Ah, yes, I missed that. That's also "illegal". In this particular case,however, it's the period that broke things. The fact that those wordsare marked as being UTF-8 is probably the only reason it works if youenable the WORKAROUND flag since there is this optimization in the decoder:


    /* slight optimization? */
    if (!g_ascii_strcasecmp (charset, "UTF-8")) {
        p = (char *) decoded;
        len = declen;

        //while (!g_utf8_validate (p, len, (const char **) &p)) {
        //    len = declen - (p - (char *) decoded);
        //    *p = '?';
        //}

        return g_strndup ((char *) decoded, declen);
    }

I believe that if I were to uncomment that while-loop, it would insertquestion-marks into the decoded text - one for each of those split bytes.


Jeff

Follow-Ups:
- Re: [gmime-devel] Decoding UTF-8 headers
  - From: Dirk-Jan C . Binnema

References:
- [gmime-devel] Decoding UTF-8 headers
  - From: Andrew Victor
- Re: [gmime-devel] Decoding UTF-8 headers
  - From: Jeffrey Stedfast
- Re: [gmime-devel] Decoding UTF-8 headers
  - From: Andrew Victor

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]