Re: [gtkmm] One thing we didn't really discuss: ustring vs string



Murray Cumming wrote:
On Thu, 2002-08-08 at 18:09, Michael Babcock wrote:

Murray Cumming wrote:
[snip]

1. We currently use Glib::ustring whenever we are not _certain_ that the
string could never be UTF8. Filenames can probably never be UTF8 so they
should be std::string. Others might have more expertise.

Why can filenames never be UTF8? I thought one of the reasons for creating UTF8 was to be able to use Unicode in Unix filenames, avoiding the null-terminated string problems of using a 16-bit wide character Unicode encoding.


Maybe because other systems can't cope with UTF8? I think, for instance,
that UTF8 can have null bytes in the middle.

No, I believe that was the exact problem with 16-bit encoded Unicode that the UTF8 encoding was created to solve.

I'm not 100% sure whether
filenames can be UTF8. Other people seem to be sure that they can't, so
I defer to them.


It's been several years since I was involved in this stuff, so I'm a bit rusty but I believe the answer is that Unix filenames definitely can be UTF8, in fact UTF8 was created so that we could use Unicode on existing Unix systems in a backwards compatible way. Now of course whether the user programs (such as ls and gtk) can present this nicely depends on whether they support this encoding, but it's transparent to the Unix system calls.


--
Michael Babcock
Jim Henson's Creature Shop





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]