Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)

From: Jeffrey Stedfast <fejj ximian com>
To: Not Zed <notzed ximian com>
Cc: evolution-patches ximian com
Subject: Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
Date: Tue, 27 Jul 2004 00:38:37 -0400

On Mon, 2004-07-26 at 23:30, Not Zed wrote:
> On Mon, 2004-07-26 at 13:52 -0400, Jeffrey Stedfast wrote: 
> > On Mon, 2004-07-26 at 13:41 +0800, Not Zed wrote:
> > > 
> > > Well we can easily check if it is utf8, actually we don't really need
> > > to - the locale is entirely under the users control, even if the
> > > default is setup a certain way.  So if the user needs to change it to
> > > work in their normal environment, at least we'll honour it.
> > 
> > sure, but locale charset is becomming less and less meaningful (as
> > distros push for UTF-8)
> Yes, but its exactly what its for, and entirely under user control.
> > > 
> > > Its just that in hindsight, I can't see how we can get the 'is it in
> > > the charset' thing to really work as soon as you move outside of the 8
> > > bit charset range.
> > 
> > nod
> > 
> > >   But even if we can identify a range of charsets which can represent
> > > a given text, there may be cases where you need to override it based
> > > on the users environment.
> > 
> > possibly. I don't really know.
> 
> Well yes, your patch does it for example by using the language hint. 
> That's part of the users environment. 
> > > 
> > > At least with CJK it has to be based on the environment since the 16
> > > bit unicode mappings have so much overlap.
> > 
> > sure, but that was one of the reasons I added the lang matching logic
> > (which was also needed for other reasons...such as not all cyrillic
> > users want koi8-r to have priority over, say, iso-8859-5).
> > 
> > > 
> > > The reverse-iconv thing doesn't really work that well either since
> > > there can be multiple ways of encoding the same thing, even if you
> > > canonicalise and recompose the unicode too (do we even do that?).
> > > e.g. e' (e with ' on top) can be encoded e' or it can be encoded ' +
> > > e.
> > 
> > yea :-\
> > 
> > > 
> > > So I don't think its worth putting back the >8 bit checks.  The tables
> > > can get big and aren't complete anyway.
> > 
> > ok.
> > 
> > > 
> > > So the language check looks good, that covers a lot of the problem.
> > > Perhaps we can extend the language check to map to preferred charsets
> > > for that language too, at least to cut it down a little bit.
> > 
> > I've separated the lang patch from the rest. shall I commit that?
> Where is it?  I guess so, though whilst in review mode I should
> probably see it first.

yea, meant to attach it. apparently I forgot.

it's just the changes to best_mask()

> > > 
> > > God i'm rambling here.  We really need proper tables if we want to do
> > > the conversion incrementally.
> > 
> > agreed.
> > 
> > > 
> > > What about:
> > > if the charset_best tells us it needs unicode, then try locale
> > > setting.  at least the user can then override it.
> > > if not, then choose an 8 bit charset based on the language hint.
> > 
> > why 8bit? the bug report is mostly people complaining that evo uses UTF-
> > 8 rather than some Chinese/Japanese/Korean charset. The 8bit charset
> > handling seems to please most everyone afaict.
> Because we can't get that info.  The descirption above just says 'try
> something different first, then fallback to what we do now'.
> 
> The locale setting is the override which is tried first.  We could
> also use the language hint to try specific charsets instead of or as
> well as the charset setting. 

*nod*

> > > 
> > > This only covers the situation 'fully' for small strings, message
> > > content is harder with the incremental code, but generally you use
> > > utf8 for html anyway.
> > 
> > right, and body content has different logic anyway - in the composer
> > where the user can select the proper charset[1]
> > 
> > this bug report is 100% about header encoding.
> Ok.  Well like the decoding thing, we could also pass around a
> preferred charset encoding based on the composer setting too. 
> Although we have to pass it around a lot of places (encode to stream
> and down to encode_word). 

yea, and I don't really like having to do that...makes the APIs kinda
icky.

> > anyways, I tested out moz-mail as jpr asked earlier and it sent as UTF-8
> > too so...I'm thinking we should just punt this for 1.5 (except perhaps
> > the lang portion of my patch). besides, this "bug" has been around for
> > years so it's not like it's a regression or anything.
> > 
> > ...and like you said, it's the other mailer's problem if it can't handle
> > standards ;)
> *nod*.
> 
> > Jeff
> > 
> > 1. perhaps we can add some interface to suggest a charset to encode the
> > headers in that uses the composer's charset pref? the problem with using
> possibly, see above. 
> > locale is that it's becomming useless since everyone is moving toward
> > UTF-8 locales on Linux. Sure, they can change it, but then a lot of shit
> > in GNOME breaks (including evo).
> If they break, they are broken themselves.
> 
> What breaks in evolution?  There was a bug that was fixed last week
> due to gettext not working in a utf8 locale, but that was a bug in
> evolution.

filenames not being in UTF-8 is the big one for all GNOME apps. if for
example a user points evo at a mbox spool with a non-UTF-8 filename, it
won't display "correctly" in the UI.

Jeff

Follow-Ups:
- Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
  - From: Jeffrey Stedfast

References:
- [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
  - From: Jeffrey Stedfast
- Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
  - From: Not Zed
- Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
  - From: Jeffrey Stedfast
- Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
  - From: Not Zed
- Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
  - From: Jeffrey Stedfast
- Re: [evolution-patches] fix for bug #24026 (try harder not to sed in UTF-8)
  - From: Not Zed

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]