UTF-8 vs. current locale charset



GTK+ 1.3 (2.0) uses UTF-8, while the file system related C runtime
calls like stat(), open() and opendir() uses a "current codepage" (the
Windows term, on Unix you want to use whatever encoding/charset the
user's locale uses).

This causes some problems in applications, my main concern is GIMP on
Windows. The code in GIMP doesn't take into consideration that strings
obtained from the user and strings that are passed to GTK+ are in in
UTF-8, while pathnames used in stat()/open() etc should be in the
current locale charset.

In GTK+ 1.3 currently the gtk_file_selection_get_filename() function
returns a string in the current locale charset, not UTF-8. Is this a
good solution? I assume it is not too late to change this, as GTK+ 1.3
is a developer version, and really used for "production" only on
Windows, mainly for GIMP. (What other apps might use GTK+ on Windows
probably don't even try to be i18n-correct anyway.)

GLib 1.3 has the functions g_filename_from_utf8() and
g_filename_to_utf8() to convert back and forth. I assume these are not
carved into stone yet, either.

Come to think of it, maybe it would be a good idea to introduce an
opaque type to GLib (or GTK+?) called "GSystemCharsetString" or
something, to be used for all strings that are in the locale-dependent
charset. This type's representation would actually be just gchar[],
but the compiler wouldn't know that, and thus it would be easier to
keep count of what strings are in what encoding. When passing
GSystemCharsetStrings to stat()/open() etc, a cast would have to be
used.

What thoughts do other people have about these issues?

--tml





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]