Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)

From: Not Zed <notzed ximian com>
To: Jeffrey Stedfast <fejj ximian com>
Cc: evolution-patches ximian com
Subject: Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
Date: Tue, 27 Jul 2004 11:30:54 +0800

On Mon, 2004-07-26 at 13:52 -0400, Jeffrey Stedfast wrote:

On Mon, 2004-07-26 at 13:41 +0800, Not Zed wrote:
> 
> Well we can easily check if it is utf8, actually we don't really need
> to - the locale is entirely under the users control, even if the
> default is setup a certain way.  So if the user needs to change it to
> work in their normal environment, at least we'll honour it.

sure, but locale charset is becomming less and less meaningful (as
distros push for UTF-8)

Yes, but its exactly what its for, and entirely under user control.

> 
> Its just that in hindsight, I can't see how we can get the 'is it in
> the charset' thing to really work as soon as you move outside of the 8
> bit charset range.

nod

>   But even if we can identify a range of charsets which can represent
> a given text, there may be cases where you need to override it based
> on the users environment.

possibly. I don't really know.

Well yes, your patch does it for example by using the language hint. That's part of the users environment.

> 
> At least with CJK it has to be based on the environment since the 16
> bit unicode mappings have so much overlap.

sure, but that was one of the reasons I added the lang matching logic
(which was also needed for other reasons...such as not all cyrillic
users want koi8-r to have priority over, say, iso-8859-5).

> 
> The reverse-iconv thing doesn't really work that well either since
> there can be multiple ways of encoding the same thing, even if you
> canonicalise and recompose the unicode too (do we even do that?).
> e.g. e' (e with ' on top) can be encoded e' or it can be encoded ' +
> e.

yea :-\

> 
> So I don't think its worth putting back the >8 bit checks.  The tables
> can get big and aren't complete anyway.

ok.

> 
> So the language check looks good, that covers a lot of the problem.
> Perhaps we can extend the language check to map to preferred charsets
> for that language too, at least to cut it down a little bit.

I've separated the lang patch from the rest. shall I commit that?

Where is it? I guess so, though whilst in review mode I should probably see it first.

> 
> God i'm rambling here.  We really need proper tables if we want to do
> the conversion incrementally.

agreed.

> 
> What about:
> if the charset_best tells us it needs unicode, then try locale
> setting.  at least the user can then override it.
> if not, then choose an 8 bit charset based on the language hint.

why 8bit? the bug report is mostly people complaining that evo uses UTF-
8 rather than some Chinese/Japanese/Korean charset. The 8bit charset
handling seems to please most everyone afaict.

Because we can't get that info. The descirption above just says 'try something different first, then fallback to what we do now'.

The locale setting is the override which is tried first. We could also use the language hint to try specific charsets instead of or as well as the charset setting.

> 
> This only covers the situation 'fully' for small strings, message
> content is harder with the incremental code, but generally you use
> utf8 for html anyway.

right, and body content has different logic anyway - in the composer
where the user can select the proper charset[1]

this bug report is 100% about header encoding.

Ok. Well like the decoding thing, we could also pass around a preferred charset encoding based on the composer setting too. Although we have to pass it around a lot of places (encode to stream and down to encode_word).

anyways, I tested out moz-mail as jpr asked earlier and it sent as UTF-8
too so...I'm thinking we should just punt this for 1.5 (except perhaps
the lang portion of my patch). besides, this "bug" has been around for
years so it's not like it's a regression or anything.

...and like you said, it's the other mailer's problem if it can't handle
standards ;)

*nod*.

Jeff

1. perhaps we can add some interface to suggest a charset to encode the
headers in that uses the composer's charset pref? the problem with using

possibly, see above.

locale is that it's becomming useless since everyone is moving toward
UTF-8 locales on Linux. Sure, they can change it, but then a lot of shit
in GNOME breaks (including evo).

If they break, they are broken themselves.

What breaks in evolution? There was a bug that was fixed last week due to gettext not working in a utf8 locale, but that was a bug in evolution.

Michael Zucchi <notzed ximian com>
"born to die, live to work, it's all downhill from here"
Novell's Evolution and Free Software Developer

Follow-Ups:
- Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
  - From: Jeffrey Stedfast

References:
- [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
  - From: Jeffrey Stedfast
- Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
  - From: Not Zed
- Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
  - From: Jeffrey Stedfast
- Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
  - From: Not Zed
- Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
  - From: Jeffrey Stedfast

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]