Re: [gmime-devel] Issue with header decoding (continues)

From: Jeffrey Stedfast <fejj gnome org>
To: evil legacy <evil legacy gmail com>
Cc: gmime-devel-list gnome org
Subject: Re: [gmime-devel] Issue with header decoding (continues)
Date: Wed, 21 Dec 2011 19:46:40 -0500

Thanks for pointing this out!

You actually don't need to check if it's ascii in a loop, it's already known by that point in the code (since the above code already checked for it).

all that was needed was:

token->is_8bit = ascii ? 0 : 1;

Anyway, I've fixed this in git master.

Jeff

On 12/21/2011 11:22 AM, evil legacy wrote:

Hey, thanks for the quick patch!

I think a found a little problem with the new tokenized decoding.
When trying to decode a broken, encoding-less header, e.g:

Subject: ×××ע× ××ש× ××ס××× ×××× ש× ××× ×ע×ר

tokenize_rfc2047_text() does
   if ((token = rfc2047_token_new_encoded_word (word, n)))

and if it's not encoded it doesn't set token->encoding but also doesn't check if it's not ascii and doesn't set is_8bit

so without token->encoding or token->is_8bit, rfc2047_decode_tokens() falls back and just copies the data as is.

which in some cases makes g_mime_header_decode_text() return a non utf-8 string.

a quick fix I found is doing the ascii check if header isn't encoded ans setting is_8bit if isn't ascii:

in tokenize_rfc2047_text() line 2136:

  } else {

   /* append the lwsp and atom tokens */

   if (lwsp != NULL) {

   tail->next = lwsp;

   tail = lwsp;

   }

   token = rfc2047_token_new (word, n);

   tail->next = token;

   tail = token;

   ascii = TRUE;

   while (n--)

   ascii = ascii && is_ascii (*word++);

   if (!ascii)

   token->is_8bit = 1;

   encoded = FALSE;

   }

Regards,

Eddie
On Mon, Dec 19, 2011 at 1:00 AM, Jeffrey Stedfast <fejj gnome org> wrote:
The attached patch should fix it if applied to the latest gmime from git master.

A quick test has it passing all of the unit tests (e.g. test-mime), so it's probably good to go.

I don't normally like to land such massive patches in a stable cycle, so if you could test this out on your messages and see how it works in the wild, that'd be great.

This patch should also handle cases where base64 and/or quoted-printable data was split between encoded-word tokens (which addresses another feature request I've gotten a few times now).

Jeff

On 12/18/2011 09:02 AM, evil legacy wrote:
Hi,

Came across another header decoding problem when dealing with badly split utf-8 headers, i.e:

=?utf-8?B?16nXoteV158g15PXldek16cgR0FSTUlOINei150gR1BTINee15XXkdeg15Qg15XXktc=?= =?utf-8?B?nSDXkNeo16DXpyDXkNeV16TXoNeq15kg157XoteV16gg157XqdeV15HXlyE=?='

it looks like someone splited an utf-8 string wrongly, leaving "half" a char on each part

g_mime_utils_header_decode_text/phrase split the header into words and decode each word separately, and since it's utf8, iconv isn't used and the string validates with this loop:

while (!g_utf8_validate (p, len, (const char **) &p)) {

len = declen - (p - (char *) decoded);

*p = '?';

}

because the original string is poorly (brokenly) splited, the 'half' chars are replaced with '?'
I'm attaching a patch that moves the utf-8 validation to the end of g_mime_utils_header_decode_text/phrase, where these decoded words are already combined

Best Regards
On Sat, Dec 17, 2011 at 6:49 PM, Jeffrey Stedfast <fejj gnome org> wrote:
Hi,

I've just released GMime 2.4.29 and 2.6.2 with your fix (and other similar fixes).

Jeff

On 12/14/2011 01:26 PM, evil legacy wrote:
Hi,

After more debugging, I found that the problem is when iconv (cd, NULL, NULL, &outbuf, &outleft) tries to flush the buffer to outbuf, but outbuf isn't big enough to hold it.

This little patch to the charset_convert function seems to fix this problem (works for me):

<patch>

diff --git a/gmime/gmime-utils.c b/gmime/gmime-utils.c

index ca32b61..093deee 100644

--- a/gmime/gmime-utils.c

+++ b/gmime/gmime-utils.c

@@ -1553,7 +1553,15 @@ charset_convert (iconv_t cd, const char *inbuf, size_t inleft, char **outp, size

   }

   } while (inleft > 0);



- iconv (cd, NULL, NULL, &outbuf, &outleft);

+ while (iconv (cd, NULL, NULL, &outbuf, &outleft) == (size_t) -1)

+ if (errno == E2BIG) {

+ outlen += 16;

+ rc = (size_t) (outbuf - out);

+ out = g_realloc (out, outlen + 1);

+ outleft = outlen - rc;

+ outbuf = out + rc;

+ }

+

   *outbuf++ = '\0';



   *outlenp = outlen;

</patch>

Best Regards
_______________________________________________
gmime-devel-list mailing list
gmime-devel-list gnome org
http://mail.gnome.org/mailman/listinfo/gmime-devel-list
--
map{map{$a=unpack"C",$_;map{$c=$a-ord;print$_ x$c and goto"a"if$c>0}("Z",
" ");a:}split//;print"\n"}(q{&[%[%`#[%["},q{&[$[![$[%["[%["},q{&[#[#[#[%[
"[%["},q{&["[%["`#a"},q{[%["a"[([%["},q{[%["[%["[([%["},q{!_#[%["[([%["})
--
map{map{$a=unpack"C",$_;map{$c=$a-ord;print$_ x$c and goto"a"if$c>0}("Z",
" ");a:}split//;print"\n"}(q{&[%[%`#[%["},q{&[$[![$[%["[%["},q{&[#[#[#[%[
"[%["},q{&["[%["`#a"},q{[%["a"[([%["},q{[%["[%["[([%["},q{!_#[%["[([%["})

References:
- [gmime-devel] Issue with header decoding (continues)
  - From: evil legacy
- Re: [gmime-devel] Issue with header decoding (continues)
  - From: Jeffrey Stedfast
- Re: [gmime-devel] Issue with header decoding (continues)
  - From: evil legacy
- Re: [gmime-devel] Issue with header decoding (continues)
  - From: Jeffrey Stedfast
- Re: [gmime-devel] Issue with header decoding (continues)
  - From: evil legacy

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]