Filename encodings and GLib
- From: Owen Taylor <otaylor redhat com>
- To: gtk-devel-list gnome org
- Cc: tml iki fi, alexl redhat com
- Subject: Filename encodings and GLib
- Date: Mon, 13 Oct 2003 11:14:29 -0400
This is in reference to:
http://bugzilla.gnome.org/show_bug.cgi?id=101792
Right now, the GLib model is that there are three forms for a filename:
A) "System filename form" ... NUL terminated byte sequence,
no interpretation for user display
B) UTF-8 form.
C) URI form
All GLib functions take form A, except for g_filename_to/from_utf8()
which do conversions between A<=>B and g_filename_to/from_uri()
which do conversions between A<=>C.
There is quite a range of GLib functions that take or return filenames
in system form:
g_file_test()/ g_file_get_contents()/g_mkstemp()/g_file_open_tmp/
g_io_channel_new_file/g_dir_open()/g_dir_read_name()/
g_spawn_*.
This model basically works on Unix, with several issues:
- When you add misencoded filenames, there is no 1-1 A <=> B
conversions.
What you want, instead of g_filename_to_utf8() is something
like g_filename_get_display_name() that returns the form
to display to the user.
- We still haven't figured out whether URI's encode UTF-8
filenames or system filenames. Nautilus and GLib, I believe,
are inconsistent about this.
- The A<=>B conversion depends on the locale, whether
G_BROKEN_FILENAMES is set, etc.
On, Windows, the situation is more problematical. As I understand it:
- The canonical name in modern Win32 is a 16-bit Unicode
name; this name is taken by Unicode (-W) functions in the
windows API
- Every filename (and directory name) has a short 8.3 form, ASCII only
for compatibility purposes, but the correspondence between this
name and the long name can only be determined by accessing the
filesystem.
- C library functions (fopen, etc) and the non-W forms of the windows
API functions either take:
- Filenames in the current codepage
- The short 8.3 form of the filenames
So, what do we use for the system representation on Windows:
1) The Unicode name, converted to UTF-8
Advantages: All filenames are representable
Conversion to display form without disk access
Disadvantages: Not accepted natively by an Windows API or C library
functions; people couldn't use fopen() in GTK+
programs ported to Win32.
2) The current codepage form of the name:
Advantages: Accepted by C library functions
Conversion to display form without disk access
Disadvantages: Not all filenames are representable
3) The short-form for non-ASCII filenames, otherwise the ASCII
only name.
Advantages: Accepted by C library functions
All files are representable
Disadvantage: Conversion to/from display form takes disk access
Can't represent files that aren't on the disk!
Of these, I think only 1) is really workable. The fact that all
filenames aren't representable is a killer objection for 2). The fact
that you can't create non-ASCII filenames with a "Save As..." dialog
is a killer objection for 3).
So, we'd need to provide wrapper functions for the C library functions
that took our system filename form. The C standard functions that take
filenames are, to my knowledge, remove(), rename(), tmpname(),
fopen(), freopen().
Am I missing some other possibility here?
Regards,
Owen
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]