Filename encodings and GLib



This is in reference to:

 http://bugzilla.gnome.org/show_bug.cgi?id=101792
 
Right now, the GLib model is that there are three forms for a filename:

 A) "System filename form" ... NUL terminated byte sequence,
    no interpretation for user display
 B) UTF-8 form. 
 C) URI form

All GLib functions take form A, except for g_filename_to/from_utf8()
which do conversions between A<=>B and g_filename_to/from_uri()
which do conversions between A<=>C.

There is quite a range of GLib functions that take or return filenames
in system form:

g_file_test()/ g_file_get_contents()/g_mkstemp()/g_file_open_tmp/
g_io_channel_new_file/g_dir_open()/g_dir_read_name()/
g_spawn_*.

This model basically works on Unix, with several issues:

 - When you add misencoded filenames, there is no 1-1 A <=> B
   conversions.

   What you want, instead of g_filename_to_utf8() is something
   like g_filename_get_display_name() that returns the form
   to display to the user.

 - We still haven't figured out whether URI's encode UTF-8
   filenames or system filenames. Nautilus and GLib, I believe,
   are inconsistent about this.

 - The A<=>B conversion depends on the locale, whether 
   G_BROKEN_FILENAMES is set, etc.

On, Windows, the situation is more problematical. As I understand it:

 - The canonical name in modern Win32 is a 16-bit Unicode
   name; this name is taken by Unicode (-W) functions in the 
   windows API

 - Every filename (and directory name) has a short 8.3 form, ASCII only
   for compatibility purposes, but the correspondence between this 
   name and the long name can only be determined by accessing the
   filesystem.

 - C library functions (fopen, etc) and the non-W forms of the windows
   API functions either take:
   
   - Filenames in the current codepage
   - The short 8.3 form of the filenames

So, what do we use for the system representation on Windows:

 1) The Unicode name, converted to UTF-8
  
    Advantages:    All filenames are representable 
                   Conversion to display form without disk access
    Disadvantages: Not accepted natively by an Windows API or C library
                   functions; people couldn't use fopen() in GTK+
                   programs ported to Win32. 

 2) The current codepage form of the name:

    Advantages:    Accepted by C library functions
                   Conversion to display form without disk access
    Disadvantages: Not all filenames are representable

 3) The short-form for non-ASCII filenames, otherwise the ASCII
    only name.

    Advantages:    Accepted by C library functions 
                   All files are representable
    Disadvantage:  Conversion to/from display form takes disk access
                   Can't represent files that aren't on the disk!

Of these, I think only 1) is really workable. The fact that all
filenames aren't representable is a killer objection for 2). The fact
that you can't create non-ASCII filenames with a "Save As..." dialog 
is a killer objection for 3). 

So, we'd need to provide wrapper functions for the C library functions
that took our system filename form. The C standard functions that take
filenames are, to my knowledge, remove(), rename(), tmpname(),
fopen(), freopen().

Am I missing some other possibility here?

Regards,
						Owen





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]