Re: Fwd: wide char string literals to Glib ustring



On Sat, 2007-12-08 at 18:20 -0500, Onur Tugcu wrote:
> On Dec 8, 2007 5:55 PM, Chris Vine <chris cvine freeserve co uk> wrote:
> > On Sat, 2007-12-08 at 12:24 -0500, Onur Tugcu wrote:
> >
> > > To me, easiest would be to be able to write unicode directly
> > > into code and to not worry about the codes. Also, I imagine
> > > multi-byte glyphs will suffer from endianness.
> >
> > No, UTF-8 is composed of a series of characters (a narrow codeset), so
> > there are no endian issues.
> >
> > > >
> > > > > When I use gtkmm with vc++ 2005, sizeof(wchar_t) is 2.
> > > > > So I assumed utf-16 encoding and wrote:
> > > > >
> > > > > Glib::ustring w2ustring(std::wstring const &w)
> > > > > {
> > > > >   gunichar2 const* utf16= reinterpret_cast<gunichar2 const*>(w.c_str());
> > > > >   gchar* utf8= g_utf16_to_utf8(utf16, -1, 0, 0, 0);
> > > > >   Glib::ustring u(utf8); g_free(utf8);
> > > > >   return u;
> > > > > }
> > > > >
> > > > > Which seems to work great like
> > > > > Glib::ustring u(w2ustring(L"üö"));
> > > > >
> > > > > But on linux with a unicode terminal,
> > > > >
> > > > > I can just set
> > > > > std::locale::global(std::locale("en_US.UTF-8"));
> > > > > Glib::ustring u(Glib::locale_to_utf8("üö"));
> > > > >
> > > > > And the code up there doesn't work (wchar_t is actually 4 bytes)
> > > > > And even the ucs4 output warnings and the resulting ustring is garbage
> > > > > or I get a segfault.
> >
> > It is not clear what it is that your "up there" refers to as not
> > working, but if it is the last code sequence, this may be because your
> > editor is not writing in UTF-8.  The string literal "üö" will be
> > embedded in your source code by the editor in whatever codeset it
> > happens to use (which might well be ISO-8859-1).  The conversion is also
> > pointless - since you have set your locale to a UTF-8 codeset
> > programmatically, the conversion does nothing.
> >
> > Were it to do something (ie were you not to have set the locale
> > programmatically by reference to a particular codeset), calling a
> > conversion function on a string literal which depends on the user's
> > locale would be non-portable as you do not know what locale your users
> > may be using.  If you want to hard code UTF-8 into your code, do so
> > directly.
> >
> > I do not understand your comment about UCS4 because your last code
> > sequence uses UTF-8 rather than wide charactgers (and your preceding
> > code sequence converts to UTF-8 from UTF-16).
> >
> > Chris

> So what's your point? That you don't understand my post?
> The editor on linux is unicode capable. It is clear from my post.
> Setting the locale does effect how Glib::locale_to_utf8 works. Clear
> from my post and docs.
> It is perfectly clear that if conversion does not work (or can not
> work in this case because
> of different number of bytes required to represent the sequence
> points), you get garbage
> or segfault or an exception. That is "does not work".
> 
> Suffice to say I'm not interested in this subject anymore. Thanks for your time.

What an extraordinary post.

Calling Glib::locale_to_utf8(), if the locale has been set
programmatically to a UTF-8 locale, does nothing.

Your editor is likely to write in the system locale of your environment,
whether or not it is unicode capable, although many can be set to a
different one, and you did not indicate whether that is what you had
done.  Anyway you asked what might be wrong and I was offering a
possible answer.  You can be sure however that the cause is not "because
of different number of bytes required to represent the sequence points"!

I hope your next topic interests you more.

Chris




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]