Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader
- From: jacky <gtkdict yahoo com cn>
- To: Jeff Stedfast <fejj novell com>, evolution-hackers gnome org
- Subject: Re: [Evolution-hackers] [patch] fixed incorrect rfc2047 decode for CJKheader
- Date: Mon, 24 Dec 2007 13:21:44 +0800 (CST)
--- Jeff Stedfast <fejj novell com>wrote:
> Hi Jacky,
>
> I've looked over your patch, but unfortunately it is
> unusable. The patch
> is riddled with buffer overflows and incorrect
> logic.
>
Yes, I use a fixed length string to store some value,
it maybe overflow. I write another version by using
heap insteads of stack. I think the stack version is
simple and enough, so I send it only. Two version of
rfc2047_decode_word() is in attachment.
Can you explain the incorrect logic in my patch?
> What types of bugs are you actually trying to fix?
> What is it about CJK
> messages in particular that are not getting decoded
> properly? Your email
> was overly vague.
>
Maybe I used the wrong word. I think I just enhance
the CJK header support. The patch enhance three point:
1) You know, encoded-words must be separated by CRLF
SPACE, but some email client do not do that.
2) A CJK character's encoded string must in an
encoded-word, but some email client divide it into two
encoded-words.
3) Some CJK character need to encode to GBK charset,
but the charset name in encoded-word is GB2312.
There are two kind of email need to support:
1) An encoded-word was divided into two line. This was
sent by dotProject v2.0.1 .
2) Use GB2312 to encode CJK character directly. Some
of them was supported by evolution, but some of them
didn't.
> Your changes to e-iconv can probably be taken if I
> understand correctly
> that GBK is a superset of gb2312 (
> http://en.wikipedia.org/wiki/GBK ),
> altho it would have been nice to have gotten some
> sort of link
> explaining that with your original email (or via a
> ChangeLog entry) :)
>
> Thanks,
>
> Jeff
>
> >>> jacky <gtkdict yahoo com cn> 12/23/07 10:09 AM
> >>>
> Hi, all.
>
> The rfc2047 decoder in libcamel can not decode some
> CJK header correctly. Although some of them are not
> correspond to RFC, but I need to decode it correctly
> and I thought if evolution can display there email
> correctly more people like it.
>
> So I write a new rfc2047 decoder, and it's in the
> patch. With the patch, libcamel can decode CJK
> header
> correctly and evolution can display CJK header
> correctly now. I had test it in my mailbox. My
> mailbox
> has 2000 emails which were sent by evolution,
> thunderbird, outlook, outlook express, foxmail, open
> webmail, yahoo, gmail, lotus notes, etc. Without
> this
> patch, almost 20% of there emails can't be decoded
> and
> displayed correctly, with this patch, 99% of there
> emails can be decoded and displayed correctly.
>
> And I found that the attachment with CJK name can't
> be
> recognised and displayed by outlook / outlook
> express
> / foxmail. This is because there email clients do
> not
> support RFC2184. Evolution always use RFC2184 encode
> mothod to encode attachment name, so the email with
> CJK named attachment can't display in outlook /
> outlook express / foxmail. In thunderbird, you can
> set
> the option "mail.strictly_mime.parm_folding" to 0 or
> 1
> for using RFC2047 encode mothod to encode attachment
> name. Can we add a similar option?
>
> Best regards.
>
___________________________________________________________
雅虎邮箱传递新年祝福,个性贺卡送亲朋!
http://cn.mail.yahoo.com/gc/index.html?entry=5&souce=mail_mailletter_tagline
/* decode rfc 2047 encoded string segment */
#define DECWORD_LEN 1024
#define UTF8_DECWORD_LEN 2048
#if 1 //USE_STACK
static char *
rfc2047_decode_word(const char *in, size_t len)
{
char prev_charset[32], curr_charset[32];
char encode;
char *start, *inptr, *inend;
char decword[DECWORD_LEN], utf8_decword[UTF8_DECWORD_LEN];
char *decword_ptr, *utf8_decword_ptr;
size_t inlen, outlen, ret;
prev_charset[0] = curr_charset[0] = '\0';
decword_ptr = decword;
utf8_decword_ptr = utf8_decword;
/* quick check to see if this could possibly be a real encoded word */
if (len < 8
|| !(in[0] == '=' && in[1] == '?'
&& in[len-1] == '=' && in[len-2] == '?')) {
return NULL;
}
inptr = in;
inend = in + len;
outlen = sizeof(utf8_decword);
while (inptr < inend) {
/* begin */
inptr = memchr (inptr, '?', inend-inptr);
if (!inptr || *(inptr-1) != '=') {
return NULL;
}
inptr++;
/* charset */
start = inptr;
inptr = memchr (inptr, '?', inend-inptr);
if (!inptr) {
return NULL;
}
strncpy (curr_charset, start, inptr-start); /* maybe overflow */
curr_charset[inptr-start] = '\0';
if (prev_charset[0] == '\0') { /* first charset in multi encode words */
strcpy (prev_charset, curr_charset);
}
d(printf ("curr_charset = %s\n", curr_charset));
/* if (charset.perv != charset.curr) iconv perv to utf8 */
if (prev_charset[0] != '\0' && strcmp(prev_charset, curr_charset)) {
inlen = decword_ptr - decword;
ret = conv_to_utf8 (prev_charset, decword, inlen, utf8_decword_ptr, outlen);
if (ret == (size_t)-1) {
printf ("conv_to_utf8() error!\n");
return NULL;
}
utf8_decword_ptr += ret;
outlen = outlen - ret;
decword_ptr = decword; /* reset decword_ptr */
strcpy (prev_charset, curr_charset);
}
/* encode */
inptr++;
encode = *inptr;
inptr++;
if (*inptr != '?') {
return NULL;
}
/* text */
inptr++;
start = inptr;
inptr = memchr (inptr, '?', inend-inptr);
if (!inptr || *(inptr+1) != '=') {
return NULL;
}
/* decode */
switch(encode) {
case 'Q':
case 'q':
inlen = quoted_decode(start, inptr-start, decword_ptr);
break;
case 'B':
case 'b':
{
int state = 0;
unsigned int save = 0;
inlen = camel_base64_decode_step(start, inptr-start, decword_ptr, &state, &save);
/* if state != 0 then error? */
}
break;
default:
/* uhhh, unknown encoding type - probably an invalid encoded word string */
return NULL;
}
if (inlen > 0) {
decword_ptr += inlen;
} else {
return NULL;
}
inptr += 2; /* skip '?=' */
} /* end of "while (inptr < inend)" */
/* at last, iconv to utf8 */
inlen = decword_ptr - decword;
ret = conv_to_utf8 (curr_charset, decword, inlen, utf8_decword_ptr, outlen);
if (ret == (size_t)-1) {
printf ("conv_to_utf8() error!\n");
return NULL;
}
utf8_decword_ptr += ret;
*utf8_decword_ptr = '\0';
return strdup (utf8_decword);
}
#else /* USE HEAP */
static char *
rfc2047_decode_word(const char *in, size_t len)
{
char *prev_charset, *curr_charset;
char encode;
char *start, *inptr, *inend;
char *decword, *decword_ptr;
char *utf8_decword, *utf8_decword_ptr;
size_t inlen, outlen, ret;
prev_charset = curr_charset = NULL;
decword = g_malloc (DECWORD_LEN);
if (!decword) {
return NULL;
}
decword_ptr = decword;
utf8_decword = g_malloc (UTF8_DECWORD_LEN);
if (!utf8_decword) {
g_free (decword);
return NULL;
}
utf8_decword_ptr = utf8_decword;
/* quick check to see if this could possibly be a real encoded word */
if (len < 8
|| !(in[0] == '=' && in[1] == '?'
&& in[len-1] == '=' && in[len-2] == '?')) {
goto _error_return;
}
inptr = in;
inend = in + len;
outlen = UTF8_DECWORD_LEN;
while (inptr < inend) {
/* begin */
inptr = memchr (inptr, '?', inend-inptr);
if (!inptr || *(inptr-1) != '=') {
goto _error_return;
}
inptr++;
/* charset */
start = inptr;
inptr = memchr (inptr, '?', inend-inptr);
if (!inptr) {
goto _error_return;
}
if (curr_charset) {
free (curr_charset);
}
curr_charset = strndup (start, inptr-start);
d(printf ("curr_charset = %s\n", curr_charset));
if (prev_charset == NULL) {
prev_charset = strdup (curr_charset);
}
/* if (charset.perv != charset.curr) iconv perv to utf8 */
if (prev_charset && strcmp(prev_charset, curr_charset)) {
inlen = decword_ptr - decword;
ret = conv_to_utf8 (prev_charset, decword, inlen, utf8_decword_ptr, outlen);
if (ret == (size_t)-1) {
printf ("conv_to_utf8() error!\n");
/* or maybe we should grow 'utf8_decword' */
goto _error_return;
}
utf8_decword_ptr += ret;
outlen = outlen - ret;
decword_ptr = decword; /* decword_ptr reset */
if (perv_charset) {
free (perv_charset);
}
perv_charset = strdup (curr_charset);
}
/* encode */
inptr++;
encode = *inptr;
inptr++;
if (*inptr != '?') {
goto _error_return;
}
/* text */
inptr++;
start = inptr;
inptr = memchr (inptr, '?', inend-inptr);
if (!inptr || *(inptr+1) != '=') {
goto _error_return;
}
/* decode */
switch(encode) {
case 'Q':
case 'q':
inlen = quoted_decode(start, inptr-start, decword_ptr);
break;
case 'B':
case 'b':
{
int state = 0;
unsigned int save = 0;
inlen = camel_base64_decode_step(start, inptr-start, decword_ptr, &state, &save);
/* if state != 0 then error? */
}
break;
default:
/* uhhh, unknown encoding type - probably an invalid encoded word string */
goto _error_return;
}
if (inlen > 0) {
decword_ptr += inlen;
} else {
/* or maybe we should grow 'decword' */
goto _error_return;
}
inptr += 2; /* skip "?=" */
} /* end of "while (inptr < inend)" */
/* at last, iconv to utf8 */
inlen = decword_ptr - decword;
ret = conv_to_utf8 (curr_charset, decword, inlen, utf8_decword_ptr, outlen);
if (ret == (size_t)-1) {
printf ("conv_to_utf8() error!\n");
/* or maybe we should grow 'utf8_decword' */
goto _error_return;
}
utf8_decword_ptr += ret;
*utf8_decword_ptr = '\0';
g_free (decword);
if (prev_charset) {
free (prev_charset);
}
if (curr_charset) {
free (curr_charset);
}
return utf8_decword;
_error_return:
g_free (decword);
g_free (utf8_decword);
if (prev_charset) {
free (prev_charset);
}
if (curr_charset) {
free (curr_charset);
}
return NULL;
}
#endif
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]