Re: UTF-8 vs. current locale charset



Tor Lillqvist <tml@iki.fi> writes:

> Pavel Machek writes:
>  > On linux, all filenames should be passed in utf-8 to kernel (that's
>  > only sane solution I know of). Otherwise you get severe problems
>  > trying to move ext2 disk to other system.
> 
> Umm, but surely the Linux kernel itself doesn't care if file names are
> in UTF-8, ISO-8859-2, EUC-JP or whatever, as long as they don't
> contain slashes (other than as path component separators) or embedded
> NULs?
> 
> How common is it that real Linux sites (I am not talking about
> personal home machines) use UTF-8 for file names? Wouldn't this cause
> interoperability problems if the same files are shared via NFS with
> other Unix systems that don't support UTF-8 locales?

Any system with filenames in a locale-dependent charset is 
badly broken. This is especially true for multi-user machines.

(You don't need to move the disk to a different system - just
have a different user who exported LANG=some_other_language)

Now, that being said, there are probably considerable numbers
of machines that are badly broken in this fashion, and GTK+
needs to support them.

What we probably should do is use the locale's charset unless
some environment variable is set (G_UTF8_FILENAMES
perhaps), or they are using the C/POSIX locale.

(The latter will no-doubt annoy some Europeans, but they
can adjust pretty easily with "export LC_CTYPE=de_DE", and
I think give iso-8859-1 special status for this would be bad.)

This gives people a couple of migration paths to non-broken
filenames:

 - Switch to a UTF-8 locale (they are uncommon now on Linux,
   but should be widely available in another year or so) 

 - Set G_UTF8_FILENAMES, and let legacy apps show such filenames
   as broken.

(GtkFilesel should put up a useful warning dialog when the user
tries to use a filename that cannot be represented in the
current locale's chracter set)

I think this works as a fairly practical solution, though it
disturbs that GTK+ will continue to create locale-dependent 
filesystem names.

Regards,
                                        Owen





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]