Re: [Debian BTS] ru_RU.UTF-8 locale

From: Jakub Jelinek <jakub redhat com>
To: Pavel Roskin <proski gnu org>
Cc: mc-devel gnome org
Subject: Re: [Debian BTS] ru_RU.UTF-8 locale
Date: Tue, 25 Feb 2003 18:04:26 +0100

On Tue, Feb 25, 2003 at 11:13:54AM -0500, Pavel Roskin wrote:
> > On Mon, Feb 24, 2003 at 11:49:14AM +0100, Adam Byrtek / alpha wrote:
> > > Hi, I know there are several people here which use the Russian locale.
> > > Could you please try to reproduce this bug report or tell me whether I
> > > can close it? Maybe this guy just doesn't know how to configure
> > > UTF-8 terminal properly? Unfortunately I can't contact him...
> 
> The problem is that gettext returns strings in UTF-8, and they are passed
> to the screen library (S-Lang or ncurses), that is supposed to show those
> strings correctly.  Part of the problem is the need to measure the actual

It is not just about gettext strings, but about filenames too.

> I also don't feel it's such a good idea to use locale to figure out the
> properties of the terminal.  The locale is meant to define locale-specific
> preferences of the user, not the properties of any software.

Well, locale tells you what charset gettext strings are in, what characters
are printable, etc. Running on UTF-8 terminal with non-UTF-8 locale is a bad
idea, similarly running non-UTF-8 terminal with UTF-8 locale.

The mc changes which are needed to support UTF-8 are at least:
a) stop assuming strlen () is usable for both the strings and their length
   on the screen
b) when truncating/etc. strings intended for display the visible length
   has to be taken into account and also it must
   ensure there are never just parts of MB chars
c) view/edit should be able to iconv from selected data charset to the
   display charset

When dealing with gettext returned strings, mbstrlen could be made way
faster by assuming all strings are valid UTF-8 in UTF-8 locale -
basically in a loop only count (char & 0xc0) != 0x80 in the string.
Unfortunately, this is not necessarily true with filenames and file content.

> Maybe it's better to use ncurses instead of S-Lang for the build with
> UTF-8 support?  ncurses has a longer history of supporting Unicode.
> Also, it is developed by Thomas Dickey, who also maintains xterm and
> terminfo.  If there are any issues with the standards (like the one I just
> mentioned), he is the person who can do something.

I don't know, haven't ever looked at ncurses UTF-8 support.
Ideally we should support both S-Lang and ncurses with
both non-UTF-8 and UTF-8.

> > mc doesn't work in UTF-8 locales. A few days I hacked mc up so that at
> > least the things I use often in mc sort-of work with UTF-8, you can find
> > the patch in ftp://people.redhat.com/jakub/mc/
> 
> First of all, to override the check for UTF-8 S-Lang, use
> --with-screen=slang instead of hacking configure.

If I remember well, that did not work, even if I specified this it was
overridden by configure.

> It would be nice if you commented your patches.  I may consider applying
> some of them.  I really don't understand why /bin/rm is better than rm.

Most of them aren't mine, I just forward ported them from older mc rpm.

> > But view is not done at all and there is still a lot of places which
> > need changing. The first thing to decide is what all locales mc wants to
> > support. E.g. supporting just ASCII compatible charsets (like UTF-8) is
> > easier than supporting ASCII incompatible ones.
> 
> I think we can limit ourselves to the ASCII compatible charsets for now.

This simplifies things.

	Jakub

References:
- [Debian BTS] ru_RU.UTF-8 locale
  - From: Adam Byrtek / alpha
- Re: [Debian BTS] ru_RU.UTF-8 locale
  - From: Jakub Jelinek
- Re: [Debian BTS] ru_RU.UTF-8 locale
  - From: Pavel Roskin

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]