Re: [gtkmm] UTF8 filenames? (Was One thing we didn't really discuss: ustring vs string)



Murray Cumming wrote:
On Thu, 2002-08-08 at 18:09, Michael Babcock wrote:

Murray Cumming wrote:
[snip]

1. We currently use Glib::ustring whenever we are not _certain_ that the
string could never be UTF8. Filenames can probably never be UTF8 so they
should be std::string. Others might have more expertise.

Why can filenames never be UTF8? I thought one of the reasons for creating UTF8 was to be able to use Unicode in Unix filenames, avoiding the null-terminated string problems of using a 16-bit wide character Unicode encoding.


At the moment we use std::string in FileSelection::get_filename().
I seem to remember Daniel dealing with this thoroughly and making some
appropriate decision, but this doesn't mean much to me:
http://developer.gnome.org/doc/API/2.0/gtk/gtkfileselection.html#gtk-file-selection-get-filename

Please discuss.


Okay, as long as std::string can contain a UTF8 string (and I see no reason why it can't, it can store all 8-bit characters), this should be fine. That page seems to be implying that the filename is in another encoding (perhaps based on the global system locale) which you should then convert to UTF8 with g_filename_to_utf8. However the filename could also already be in UTF8 of course, in which case that conversion should still work and do nothing. In my opinion global Unix locales are to be worked around, not used, and the only sane ones use utf8, but that's another subject.

We could automatically do the g_filename_to_utf8() in gtkmm and return a ustring, but that would not be good if people wanted to access the raw, on-disk encoding a byte at a time (perhaps to move a file without going through two, possibly lossy, character set conversions).

--
Michael Babcock
Jim Henson's Creature Shop





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]