Re: [gmime-devel] Issue with header decoding (continues)
From: Jeffrey Stedfast <fejj gnome org>
To: evil legacy <evil legacy gmail com>
Cc: gmime-devel-list gnome org
Subject: Re: [gmime-devel] Issue with header decoding (continues)
Date: Wed, 21 Dec 2011 19:46:40 -0500
Thanks for pointing this out!
You actually don't need to check if it's ascii in a loop, it's
already known by that point in the code (since the above code
already checked for it).
all that was needed was:
token->is_8bit = ascii ? 0 : 1;
Anyway, I've fixed this in git master.
Jeff
On 12/21/2011 11:22 AM, evil legacy wrote:
Hey, thanks for the quick patch!
I think a found a little problem with the new tokenized
decoding.
When trying to decode a broken, encoding-less header, e.g:
Subject: ×××ע× ××ש× ××ס××× ×××× ש× ××× ×ע×ר
tokenize_rfc2047_text() does
if ((token =
rfc2047_token_new_encoded_word (word, n)))
and if it's not encoded it doesn't set
token->encoding but also doesn't check if it's not
ascii and doesn't set is_8bit
so without token->encoding or token->is_8bit,
rfc2047_decode_tokens() falls back and just copies the
data as is.
which in some cases makes g_mime_header_decode_text()
return a non utf-8 string.
a quick fix I found is doing the ascii check if header
isn't encoded ans setting is_8bit if isn't ascii:
in tokenize_rfc2047_text() line 2136:
} else {
/* append the lwsp and
atom tokens */
if (lwsp != NULL) {
tail->next =
lwsp;
tail = lwsp;
}
token =
rfc2047_token_new (word, n);
tail->next = token;
tail = token;
ascii = TRUE;
while (n--)
ascii = ascii
&& is_ascii (*word++);
if (!ascii)
token->is_8bit = 1;
encoded = FALSE;
}
Regards,
Eddie
On Mon, Dec 19, 2011 at 1:00 AM,
Jeffrey Stedfast <fejj gnome org>
wrote:
The attached
patch should fix it if applied to the latest gmime
from git master.
A quick test has it passing all of the unit tests
(e.g. test-mime), so it's probably good to go.
I don't normally like to land such massive patches in
a stable cycle, so if you could test this out on your
messages and see how it works in the wild, that'd be
great.
This patch should also handle cases where base64
and/or quoted-printable data was split between
encoded-word tokens (which addresses another feature
request I've gotten a few times now).
Jeff
On 12/18/2011 09:02 AM, evil legacy wrote:
Hi,
Came across another header decoding
problem when dealing with badly split utf-8
headers, i.e:
it looks like someone splited an utf-8
string wrongly, leaving "half" a char on
each part
g_mime_utils_header_decode_text/phrase
split the header into words and decode each
word separately, and since it's utf8, iconv
isn't used and the string validates with
this loop:
while (!g_utf8_validate (p, len, (const
char **) &p)) {
len
= declen - (p - (char *) decoded);
*p
= '?';
}
because the original string is poorly
(brokenly) splited, the 'half' chars are
replaced with '?'
I'm attaching a patch that moves the
utf-8 validation to the end of
g_mime_utils_header_decode_text/phrase,
where these decoded words are already
combined
Best Regards
On Sat, Dec 17,
2011 at 6:49 PM, Jeffrey Stedfast <fejj gnome org>
wrote:
Hi,
I've just released GMime 2.4.29 and
2.6.2 with your fix (and other
similar fixes).
Jeff
On 12/14/2011 01:26 PM, evil
legacy wrote:
Hi,
After more debugging, I
found that the problem is
when iconv (cd, NULL,
NULL, &outbuf,
&outleft) tries to
flush the buffer to
outbuf, but outbuf isn't
big enough to hold it.
This little patch to
the charset_convert
function seems to fix this
problem (works for me):