Re: UTF-8 sequences in filenames

On 5 Mar 2003, Dmitry G. Mastrukov wrote:

> Hi!
> I'd like to raise problem we (in Russian community) encountered with
> filenames.
> The problem exists in non-unicode locales (ru_RU.KOI8-R, ru_RU.CP1251)
> with mamy GNOME apps (gedit, glade, nautilus at least). Files with
> national filenames are viwed through file selector (G_BROKEN_FILENAMES
> is set) and can be opened. But newly created files looks on filesystem
> (in mc for example) as UTF-8 sequences.
> We have some preliminary patches to gnome-vfs but we have no confidence
> in that gnome-vfs is only source of problem. It should be good to find
> out thougths of gnome-vfs developers about that.
> We see two possibilities:
> 1. Gnome-vfs is broken and need patching. Seems it is not using
> g_filename_from_utf8/uri() functions from glib at all. Then where is
> control point for possible conversion? In other words which public
> functions work with "locale-dependant" filenames and which work with UTF
> definitely? Brief look at gedit sources shows
> gnome_vfs_get_local_path_from_uri is used to get filename to feed to
> fopen(). So it should produces locale filename on output. Also may be
> do_action() functions in method/file-method.c and some other methods
> should use g_filename_fron_utf8/uri().

This is not right. gnome-vfs uris cannot be defined in a specific 
encoding, just like unix pathnames cannot be. You must be able to use 
gnome-vfs to access a file that has an invalid utf8 name for instance 
(maybe someone created it with G_BROKEN_FILENAMES in another locale), 
otherwise you can't rename it or delete that file. Similarily uri:s may 
reference remote systems that don't use utf8-encoded filenames, nor your 
local locale encoding. 

Basically, all URIs in gnome-vfs are in an filesystem-dependent unknown 
encoding (maybe even no/several encodings, such as on a system using 
G_BROKEN_FILENAMES with users in different locales), and it has to be that
way for gnome-vfs to be able to access all files. However, the goal of 
Gnome filename handling is that all files created are in either UTF8 
(default) or at the users choice (G_BROKEN_FILENAMES is set) in the 
current locale.

In general the filename encoding issue this is a hard problem, and a full
real solution will not be availible for a long time, until the whole world
has switched over to a common encoding. Many sources of filenames just 
don't have a corresponding filename encoding specified, so until everyone 
use the same we have to guess. Take an ftp site for instance. How are you 
supposed to know the encoding it uses for filenames?

> 2. Gnome-vfs is OK but developer do not use it power properly. Then it
> should exist some document with description of right method of dealing
> with filenames. That document should be widely known to developers.

Yes. This is true, in fact I've fixed some nautilus bugs relating to this 
recently. For Gtk+ apps, when creating a file you're supposed to use 
g_filename_from_utf8() to get the actual file name to use from what the 
user entered (which is always utf8). 

 Alexander Larsson                                            Red Hat, Inc 
                   alexl redhat com    alla lysator liu se 
He's a notorious vegetarian stage actor trapped in a world he never made. 
She's a strong-willed red-headed mermaid with an incredible destiny. They 
fight crime! 

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]