Re: [evolution-patches] newest patch for fix non rfc2047 compliant i18n mailer



On Wed, 2004-06-02 at 23:36 +0800, cantona wrote:
On Wed, 2004-06-02 at 19:49 +0800, Not Zed wrote:
> 
> 
> are you sure this is right?  why are the q and b cases different?
> shouldn't they be the same?  they should really be the same code
> instance too, they both do basically the same thing.
> 
in q case can be =?UTF-7?Q?=E9=98=BF=E9=98=BF=E9=98=BF?=

so it is suppose to be "inptr = strstr(start+1, "=?")" instead of scan
the "?="

in b case it is only can be =?blah?b?blah=?
so "inptr = strstr(start+1, "?=") + 2"  is the way

Uh, this is still not going to work.  Since you wont always get it.

You need to check for ?=

You KNOW that you just got ?B? so you should start scanning AFTER that ?, so you can always search for ?=.

You keep scanning from the start of the string even tho you know you've already found parts in the string.


> and you do a lot of strstr - its fairly expensive to keep doing that
> over and over the same string.
> 

ok..but it wont save much since there are not much same strstr within
the string. pls tell me if there is good method

> this also will look for broken strings across whitespace, which is an
> even more broken case, but this will break plain strings which don't
> otherwise need encoding.
> 
> e.g. what will this do?
> 
>    =? foo bar
> It will go into the else case, then wont match any of the subcases, it
> will then start scanning backwards until it finds =? -> which might
> clearly go beyond the start of the string!  crash!
> 

ok, the new patch solved it

> You should definitly not be scanning backwards at all, if you can't
> find ?= anywhere going forward then you certainly dont care if you
> find it goind backwards.
> 
> You should also skip things you've already scanned.
> 

well since the q case "inptr = strstr(start+1, "=?")"
scan the start of new encode-word, so it is suppose to scan backwords
eg. inptr = "=?UTF-7?Q?=E9=98=BF=E9=98=BF=E9=98=BF?=XXXXXXXX"
and i have no idea about any better method
See above.  The q and b cases should be rolled into one.  They can easily be so.  You cannot get ?= in the encoded part of the text, you're scanning too much (from too early).

> And
> 
> =?  foo bar  baz .... ?b? 
> will also do weird shit, and
> 
> =? foo bar baz ... ?b?  balh blah ?=

ok both solved in new patch

> Will also break.  I dont think this should be treated as an encoded
> word, even if some mailers write similar crap out to that (i.e.
> embedded whitespace).
> 
> So still some issues, its more on the right track though.

I try my best to test all cause that can break
attached the new patch that might solve about the problem of NotZed
represent. If there is still a problem correct me please.
You might want to consider creating a regression test case in camel/tests/misc as well.  e.g. see how test2.c tests rfc2184 encoding.

Also make sure you run 'make check' anyway, there are tests in tests/message that will hit this code, for addressing.

Unfortunately your new patch is making things worse, the code is far too complex now.

You're still scanning backwards.  Don't.  There's no reason to ever scan backwards.

It will still find encoded words across spaces.  Maybe you want to do this, but i don't think we do.  The code already breaks the string into space separated tokens, it should probably start with that as the basis.

And FWIW  inptr[0] == '?' && inptr[1] == '=' is a lot more readable than *inptr == '?' && *(inptr+1) == '='.

And you can flip the sense of logic like
  (!(a==x && b == y))
to
(a !=b || b != y)
which may be more readable too.

You're also doing things like
inptr < inend && inptr[2] == x

which uh, will access past the end of the array.  These are not guaranteed to be nul terminated strings.

Infact this means you can't call strstr at all, anywhere with this code (i missed this earlier).  You need to so something like (but you can't do exactly this, if you don't have at least n characters left):

if ( (p = memchr(inptr, '=', inend-inptr-1)
    && p[1] == '?') {
}

Infact you probably only need to search till inend-inptr-8 since the minimum (invalid?) sequence will take at least 8 characters to define i.e. '=?x?b??='.  But all the length testing gets messy.

This is a fairly critical bit of reasonably tricky code, you've put a fair effort into it, but there are still things to consider, do you still wish to perservere with this?  I don't mean to discourage you, but its not as easy as it looks perhaps.

--
Michael Zucchi <notzed ximian com>

Ximian Evolution and Free Software Developer


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]