Re: [gmime-devel] Issue with header decoding (continues)
From: evil legacy <evil legacy gmail com>
To: Jeffrey Stedfast <fejj gnome org>
Cc: gmime-devel-list gnome org
Subject: Re: [gmime-devel] Issue with header decoding (continues)
Date: Wed, 21 Dec 2011 18:22:18 +0200
Hey, thanks for the quick patch!
I think a found a little problem with the new tokenized decoding.
When trying to decode a broken, encoding-less header, e.g:
Subject: ×××ע× ××ש× ××ס××× ×××× ש× ××× ×ע×ר
tokenize_rfc2047_text() does
if ((token = rfc2047_token_new_encoded_word (word, n)))
and if it's not encoded it doesn't set token->encoding but also doesn't check if it's not ascii and doesn't set is_8bit
so without token->encoding or token->is_8bit, rfc2047_decode_tokens() falls back and just copies the data as is.
which in some cases makes g_mime_header_decode_text() return a non utf-8 string.
a quick fix I found is doing the ascii check if header isn't encoded ans setting is_8bit if isn't ascii:
in tokenize_rfc2047_text() line 2136:
} else {
/* append the lwsp and atom tokens */
if (lwsp != NULL) {
tail->next = lwsp;
tail = lwsp;
}
token = rfc2047_token_new (word, n);
tail->next = token;
tail = token;
ascii = TRUE;
while (n--)
ascii = ascii && is_ascii (*word++);
if (!ascii)
token->is_8bit = 1;
encoded = FALSE;
}
Regards,
Eddie
On Mon, Dec 19, 2011 at 1:00 AM, Jeffrey Stedfast <fejj gnome org> wrote:
The attached patch should fix it if applied to the latest gmime from
git master.
A quick test has it passing all of the unit tests (e.g. test-mime),
so it's probably good to go.
I don't normally like to land such massive patches in a stable
cycle, so if you could test this out on your messages and see how it
works in the wild, that'd be great.
This patch should also handle cases where base64 and/or
quoted-printable data was split between encoded-word tokens (which
addresses another feature request I've gotten a few times now).
Jeff
On 12/18/2011 09:02 AM, evil legacy wrote:
Hi,
Came across another header decoding problem when dealing
with badly split utf-8 headers, i.e:
it looks like someone splited an utf-8 string wrongly,
leaving "half" a char on each part
g_mime_utils_header_decode_text/phrase split the header
into words and decode each word separately, and since it's
utf8, iconv isn't used and the string validates with this
loop:
while (!g_utf8_validate (p, len, (const char **) &p))
{
len
= declen - (p - (char *) decoded);
*p
= '?';
}
because the original string is poorly (brokenly) splited,
the 'half' chars are replaced with '?'
I'm attaching a patch that moves the utf-8 validation to
the end of g_mime_utils_header_decode_text/phrase, where these
decoded words are already combined
Best Regards
On Sat, Dec 17, 2011 at 6:49 PM,
Jeffrey Stedfast <fejj gnome org>
wrote:
Hi,
I've just released GMime 2.4.29 and 2.6.2 with your
fix (and other similar fixes).
Jeff
On 12/14/2011 01:26 PM, evil legacy wrote:
Hi,
After more debugging, I found that the
problem is when iconv (cd, NULL, NULL,
&outbuf, &outleft) tries to flush
the buffer to outbuf, but outbuf isn't big
enough to hold it.
This little patch to the charset_convert
function seems to fix this problem (works
for me):