Re: [Evolution-hackers] improved rfc2047 decode patch

From: jacky <gtkdict yahoo com cn>
To: Jeffrey Stedfast <fejj novell com>
Cc: evolution-hackers gnome org
Subject: Re: [Evolution-hackers] improved rfc2047 decode patch
Date: Fri, 4 Jan 2008 00:21:14 +0800 (CST)
I use evolution-data-server r8327 (I know your patch
was commited on r8322, I used trunk ;) to check all
emails on my mbox, the result is:
evolution-data-server 2.21.4: 20% emails' CJK header
can't been decode.
evolution-data-server r8327: 5% emails' CJK header
can't been decode.
evolution-data-server with my patch: 1% emails' CJK
header can't been decode.
And I use gmime (bu using
g_mime_utils_header_decode_text() function) to decode
all encoded-words on my mbox. The result is the same
as evolution-data-server r8327.
What's the different between your patch and my patch?
I think when your decoder decode the email that break
a single multi-byte character across multiple
encoded-word tokens, it ignore the byte(s) that can't
convent to UTF-8. And in my patch,
header_decode_text() will parse as many as possible
encoded-words that has the same charset and encoding,
then call rfc2047_decode_word() to decode these
encoded-words. In fact you can't simply ignore the
byte(s), because the first byte in an encoded-word can
combo with the last byte in the front encoded-word is
a multe-byte encoding character, but it combo with the
next byte can be another multe-byte encoding character
too. Here are some examples on my mbox:
=?GB2312?Q?=C4=D4=B1=D8=D0=EB=BC=E6=C8=DDLinux=B2=D9=D7?==?GB2312?Q?=F7=CF=B5=CD=B3?=
=?GB2312?Q?_=B0=B2=D7=B0=D2=BB=D0=A9=B3=A3=D3=C3=B5=C4=C8=ED=BC?==?GB2312?Q?=FE=B0=FC=B5=BDubuntu=CF=B5=CD=B3=A3=A1?=
=?GB2312?B?ILj3zrvH8tPRo6zPyNbCx7ijrNPJ09rKsbzkudjPtaOswLbH8sqxvA==?==?GB2312?B?5Lao1NrQx8bazuU0?=:00-6:00


--- Jeffrey Stedfast <fejj novell com>worte:

> 
> On Thu, 2007-12-27 at 08:46 +0800, jacky wrote:
> > --- Jeffrey Stedfast <fejj novell com>wrote:
> > 
> > > 
> > > On Thu, 2007-12-27 at 00:20 +0800, jacky wrote:
> > > > It seem that your patch don't support this
> kind of
> > > > encoded string:
> > > >
> > >
> >
>
=?gb2312?b?<any-encoded-text?==?gb2312?b?<any-encoded-text?=
> > > > Two encoded-words are not separated by any
> > > character.
> > > 
> > > Are you sure? I wrote the code to be able to
> handle
> > > this case and I just
> > > tested it again (noticed that I didn't have a
> test
> > > case like this in my
> > > test suite so added one) and it works fine.
> > > 
> > > Do you have an example subject/whatever header
> for
> > > me to test against?
> > > 
> > 
> > I make my conclusion too hastiness. Yes, your
> patch
> > support this kind of email,
> 
> ok ;-)
> 
> >  but it didn't support the
> > email that break a single multi-byte character
> across
> > multiple encoded-word tokens, and when it decode
> the
> > header that break a encoded-word token across two
> > lines, there is no result display on evolution,
> for
> > example, the Subject is empty.
> 
> ok, just fixed this in svn. I had tested a broken
> UTF-8 header earlier
> and so didn't see a slight bug in my code.
> 
> > I'll use Camle with your patch to check all email
> on
> > my mbox  and use gmime to decode all email header
> to
> > find out it's capacity.
> 
> 
> Ok, awesome.
> 
> Jeff
> 
> 



      ___________________________________________________________ 
雅虎邮箱传递新年祝福，个性贺卡送亲朋！ 
http://cn.mail.yahoo.com/gc/index.html?entry=5&souce=mail_mailletter_tagline
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]