Re: Filename encodings and GLib

On Tue, 2003-10-14 at 16:40, Owen Taylor wrote:
> On Tue, 2003-10-14 at 04:30, Alexander Larsson wrote:
> > On Mon, 2003-10-13 at 17:14, Owen Taylor wrote:
> > > This is in reference to:
> > > 
> > >
> > >  
> > > Right now, the GLib model is that there are three forms for a filename:
> > > 
> > >  A) "System filename form" ... NUL terminated byte sequence,
> > >     no interpretation for user display
> > >  B) UTF-8 form. 
> > >  C) URI form
> > ...
> > >  - We still haven't figured out whether URI's encode UTF-8
> > >    filenames or system filenames. Nautilus and GLib, I believe,
> > >    are inconsistent about this.
> > 
> > Yeah. Which is a bit unfortunate. The problem with URIs is of course
> > that we can't rely on the encoding of the filename in general, since the
> > file could be on e.g. a remote ftp server with unknown encoding. Another
> > issue is also that nautilus must be able to handle misencoded filenames
> > so they can be renamed to something correct.
> Note that for ftp, http, etc, it's a non-issue. The encoding is whatever
> the remote server picks; the extent to which we can interpret the
> octets as a human-readable string will depend on the relevant RFC's.


> Really, the only question is for URI's we are generating ourselves,
> and in particular, for file:// URIs.
> I don't think nautilus is particularly unique in needing to handle 
> misencoded filenames. And if it *is* unique, that still doesn't mean
> it can use a different file:// scheme than the rest of the desktop.
> If we believe that the straight octet encoding is correct, then we
> should:
>  A) Fix GLib to do the same
>  B) Push this as a mini-spec on

I haven't given this a lot of though, but off the cuff I think straight
octet encoding is the right way. So maybe we should do this.

> The main problem with straight octet-encoding of filenames is that
> at best you can only guess how to display them to the user as anything
> other than the literal URI.

The same is true for system-encoded filenames. Thats why we have
g_filename_to_utf8(). In fact, the basic reason we need octet encoded
file: uris is to have a one-to-one mapping between "system filename
form" and file: uris, since some apps use file: uris instead of
filenames as the base format. 

Of course, this leads to the question, how does a file: uri look on
windows? If we chose utf8 as the system filename encoding on win32, will
the file: uris be compatible with IE?

 Alexander Larsson                                            Red Hat, Inc 
                   alexl redhat com    alla lysator liu se 
He's an immortal guerilla card sharp on the edge. She's a manipulative 
extravagent barmaid with a flame-thrower. They fight crime! 

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]