Re: Request for discussion - how to make MC unicode capable

On Sunday 25 February 2007 14:41, Leonard den Ottolander wrote:
> Hello Pavel,
> On Sat, 2007-02-24 at 14:57 +0200, Pavel Tsekov wrote:
> > I'd like to initiate a discussion on how to make MC
> > unicode deal with multibyte character sets.
> >

The current utf-8 patches are based on utf-8 support in glibc.
I don't know if utf-8 is needed on other systems.

> Just a few thoughts:
> - Because multibyte is rather more memory hungry I think the user should
> still have the option to toggle the use of an 8bit path either in the
> interface or at compile time. This means where the UTF-8 patches replace
> paths we should preferably implement two paths.

The situation with the utf-8 patches is following:
In editor the utf-8 charset is converted to wchar. This requires 4 times
more of memory, but allows to keep the code almost the same.
In the rest of mc the utf-8 charset is used directly and the memory
requirements are more or less the same as with 8bit charsets.

> - I suppose a lot of the code of the UTF-8 patch can be reused, only we
> will need to add iconv() calls in the appropriate places. libiconv is
> already expected so not much trouble with the make files there. Iconv
> should only be used for the multibyte path, not the 8bit path. Using the
> multibyte path would still enable users to translate from one 8bit
> charset to another.
> - Unsupported character substitution character should be an ini option
> (and define some defaults for all/many character sets). (I'm not sure
> question mark is supported in all character sets.)
> - Users should be able to set character set per directory (mount). Of
> course there should be a system wide default taken from the environment
> (but also overridable).
> - Copy/move dialogs should have a toggle to iconv the file name or do a
> binary name copy.
> - Maybe copy/move dialogs should also have a toggle to iconv file
> content, which could be quite usable for text files. A warning dialog on
> every copy/move (that the user explicitly has to disable) might be a
> good addition then, to help uninformed users avoiding to screw up their
> data.

The code in charsets.c is not compatible with utf-8 and needs to be completely 
rewritten. For example, the function convert_to_display(char *str) can't be 
used for converting to utf-8 where the string actually grows.

With the current utf-8 patches charsets can't be used in utf-8 locales.

Vladimir Nadvornik
SUSE LINUX, s. r. o.                        e-mail: nadvornik suse cz
Lihovarská 1060/12                          tel:+420 284 028 967
190 00 Praha 9                              fax:+420 284 028 951
Czech Republic                        

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]