Re: [gmime-devel] Decoding UTF-8 headers



On 07/16/2011 04:01 PM, Andrew Victor wrote:

[snip]
Ok.

Is it not also an issue that 1 byte of a 2-byte UTF-8 "character"
(=C3=AD) is in one encoded-word, and the other byte in the next
encoded-word (next line)?

Ah, yes, I missed that. That's also "illegal". In this particular case, however, it's the period that broke things. The fact that those words are marked as being UTF-8 is probably the only reason it works if you enable the WORKAROUND flag since there is this optimization in the decoder:

    /* slight optimization? */
    if (!g_ascii_strcasecmp (charset, "UTF-8")) {
        p = (char *) decoded;
        len = declen;

        //while (!g_utf8_validate (p, len, (const char **) &p)) {
        //    len = declen - (p - (char *) decoded);
        //    *p = '?';
        //}

        return g_strndup ((char *) decoded, declen);
    }

I believe that if I were to uncomment that while-loop, it would insert question-marks into the decoded text - one for each of those split bytes.

Jeff



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]