Re: Glib::ustring and C++11 utf-8 literals

From: Dénes Almási <denes rudanium org>
To: Chris Vine <chris cvine freeserve co uk>
Cc: gtkmm-list gnome org
Subject: Re: Glib::ustring and C++11 utf-8 literals
Date: Fri, 17 Feb 2012 10:15:18 +0100

[Oops, sorry Chris, forgot to CC to the list]

Thank you for the explanation, made many things clearer for me! So when

developing a desktop application, the best rule would be to usegettext()almost everywhere, and on the rare occasions when one can't use that,useu8 with unicode code points. I will have a closer look at gettext asI've

never used it before.

Messing with the encoding of the source code seems to be problematicand

I'de rather avoid it.

Thanks,
Dennis

On 2012-02-16 13:05, Chris Vine wrote:

On Thu, 16 Feb 2012 11:04:19 +0100
Dénes Almási <denes rudanium org> wrote:
Hi! As one can see on wikipedia, C++11 offers the ability to create
utf-8 string literals.
(http://en.wikipedia.org/wiki/C%2B%2B11#New_string_literals [1] [1])Is
it possible to pass these safely to Glib::ustring when constructing
them? It is suspicious that Glib::ustring::ustring(const char *_src_,
size_type _n_ ) constructor will do the job. Is this right?
All string literals are null terminated after conversion to the
execution character set, so you pass it to the constructor taking a
const char*. The requirement of Glib::ustring is that this execution
character set must be UTF-8.
I have not done any playing about with C++11 string literals, but asI
understand it you should be OK with a string literal with the 'u8'
prefix assuming the compiler is able to perform the conversion fromthesource character set, what your code editor spits out, to thisexecution
character set (see §2.2/5: "Each source character set member in a
character literal or a string literal, as well as each escapesequenceand universal-character-name in a character literal or a non-rawstring
literal, is converted to the corresponding member of the execution
character set"; and §2.4.15/7: "A string literal that begins with u8,
such as u8"asdf", is a UTF-8 string literal and is initialized withthe
given characters as encoded in UTF-8").
The key to this is my "assuming the compiler ..." above: the problemisthat you have to let the compiler know what your source character setisin order for it to perform this conversion, and gcc will in theabsence
of the appropriate switch assume your source file is in your locale
encoding, which makes source files non-portable with non-ASCII string
literals unless you are importing whole unicode code points into your
u8 string (for which purpose the u8 prefix is genuinely useful).

For more on gcc, see
http://gcc.gnu.org/onlinedocs/cpp/Character-sets.html and also thegcc
switch documentation:

"-finput-charset=charset: Set the input character set, used for
translation from the character set of the input file to the source
character set used by GCC. If the locale does not specify, or GCC
cannot get this information from the locale, the default is UTF-8.
This can be overridden by either the locale or this command line
option. Currently the command line option takes precedence if there's
a conflict. charset can be any encoding supported by the system's
iconv library routine."

If you use windows, I believe VS uses Windows ANSI as its default
source encoding, but you would need to look it up if that is your
platform.

This makes it almost always better to pass in your string literals
programmatically, say via gettext().

Chris




Links:
------
[1] http://en.wikipedia.org/wiki/C%2B%2B11#New_string_literals

References:
- Glib::ustring and C++11 utf-8 literals
  - From: =?UTF-8?Q?D=C3=A9nes_Alm=C3=A1si?=
- Re: Glib::ustring and C++11 utf-8 literals
  - From: Chris Vine

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]