Re: Fwd: wide char string literals to Glib ustring



On Sat, 2007-12-08 at 12:24 -0500, Onur Tugcu wrote:

> To me, easiest would be to be able to write unicode directly
> into code and to not worry about the codes. Also, I imagine
> multi-byte glyphs will suffer from endianness.

No, UTF-8 is composed of a series of characters (a narrow codeset), so
there are no endian issues.

> >
> > > When I use gtkmm with vc++ 2005, sizeof(wchar_t) is 2.
> > > So I assumed utf-16 encoding and wrote:
> > >
> > > Glib::ustring w2ustring(std::wstring const &w)
> > > {
> > >   gunichar2 const* utf16= reinterpret_cast<gunichar2 const*>(w.c_str());
> > >   gchar* utf8= g_utf16_to_utf8(utf16, -1, 0, 0, 0);
> > >   Glib::ustring u(utf8); g_free(utf8);
> > >   return u;
> > > }
> > >
> > > Which seems to work great like
> > > Glib::ustring u(w2ustring(L"üö"));
> > >
> > > But on linux with a unicode terminal,
> > >
> > > I can just set
> > > std::locale::global(std::locale("en_US.UTF-8"));
> > > Glib::ustring u(Glib::locale_to_utf8("üö"));
> > >
> > > And the code up there doesn't work (wchar_t is actually 4 bytes)
> > > And even the ucs4 output warnings and the resulting ustring is garbage
> > > or I get a segfault.

It is not clear what it is that your "up there" refers to as not
working, but if it is the last code sequence, this may be because your
editor is not writing in UTF-8.  The string literal "üö" will be
embedded in your source code by the editor in whatever codeset it
happens to use (which might well be ISO-8859-1).  The conversion is also
pointless - since you have set your locale to a UTF-8 codeset
programmatically, the conversion does nothing.

Were it to do something (ie were you not to have set the locale
programmatically by reference to a particular codeset), calling a
conversion function on a string literal which depends on the user's
locale would be non-portable as you do not know what locale your users
may be using.  If you want to hard code UTF-8 into your code, do so
directly.

I do not understand your comment about UCS4 because your last code
sequence uses UTF-8 rather than wide charactgers (and your preceding
code sequence converts to UTF-8 from UTF-16).

Chris




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]