Filename encodings and GLib
- From: Owen Taylor <otaylor redhat com>
- To: gtk-devel-list gnome org
- Cc: tml iki fi, alexl redhat com
- Subject: Filename encodings and GLib
- Date: Mon, 13 Oct 2003 11:14:29 -0400
This is in reference to:
 http://bugzilla.gnome.org/show_bug.cgi?id=101792
 
Right now, the GLib model is that there are three forms for a filename:
 A) "System filename form" ... NUL terminated byte sequence,
    no interpretation for user display
 B) UTF-8 form. 
 C) URI form
All GLib functions take form A, except for g_filename_to/from_utf8()
which do conversions between A<=>B and g_filename_to/from_uri()
which do conversions between A<=>C.
There is quite a range of GLib functions that take or return filenames
in system form:
g_file_test()/ g_file_get_contents()/g_mkstemp()/g_file_open_tmp/
g_io_channel_new_file/g_dir_open()/g_dir_read_name()/
g_spawn_*.
This model basically works on Unix, with several issues:
 - When you add misencoded filenames, there is no 1-1 A <=> B
   conversions.
   What you want, instead of g_filename_to_utf8() is something
   like g_filename_get_display_name() that returns the form
   to display to the user.
 - We still haven't figured out whether URI's encode UTF-8
   filenames or system filenames. Nautilus and GLib, I believe,
   are inconsistent about this.
 - The A<=>B conversion depends on the locale, whether 
   G_BROKEN_FILENAMES is set, etc.
On, Windows, the situation is more problematical. As I understand it:
 - The canonical name in modern Win32 is a 16-bit Unicode
   name; this name is taken by Unicode (-W) functions in the 
   windows API
 - Every filename (and directory name) has a short 8.3 form, ASCII only
   for compatibility purposes, but the correspondence between this 
   name and the long name can only be determined by accessing the
   filesystem.
 - C library functions (fopen, etc) and the non-W forms of the windows
   API functions either take:
   
   - Filenames in the current codepage
   - The short 8.3 form of the filenames
So, what do we use for the system representation on Windows:
 1) The Unicode name, converted to UTF-8
  
    Advantages:    All filenames are representable 
                   Conversion to display form without disk access
    Disadvantages: Not accepted natively by an Windows API or C library
                   functions; people couldn't use fopen() in GTK+
                   programs ported to Win32. 
 2) The current codepage form of the name:
    Advantages:    Accepted by C library functions
                   Conversion to display form without disk access
    Disadvantages: Not all filenames are representable
 3) The short-form for non-ASCII filenames, otherwise the ASCII
    only name.
    Advantages:    Accepted by C library functions 
                   All files are representable
    Disadvantage:  Conversion to/from display form takes disk access
                   Can't represent files that aren't on the disk!
Of these, I think only 1) is really workable. The fact that all
filenames aren't representable is a killer objection for 2). The fact
that you can't create non-ASCII filenames with a "Save As..." dialog 
is a killer objection for 3). 
So, we'd need to provide wrapper functions for the C library functions
that took our system filename form. The C standard functions that take
filenames are, to my knowledge, remove(), rename(), tmpname(),
fopen(), freopen().
Am I missing some other possibility here?
Regards,
						Owen
[
Date Prev][
Date Next]   [
Thread Prev][
Thread Next]   
[
Thread Index]
[
Date Index]
[
Author Index]