Re: Glib::ustring and C++11 utf-8 literals
- From: Dénes Almási <denes rudanium org>
- To: Chris Vine <chris cvine freeserve co uk>
- Cc: gtkmm-list gnome org
- Subject: Re: Glib::ustring and C++11 utf-8 literals
- Date: Fri, 17 Feb 2012 10:15:18 +0100
[Oops, sorry Chris, forgot to CC to the list]
Thank you for the explanation, made many things clearer for me! So when
developing a desktop application, the best rule would be to use
gettext()
almost everywhere, and on the rare occasions when one can't use that,
use
u8 with unicode code points. I will have a closer look at gettext as
I've
never used it before.
Messing with the encoding of the source code seems to be problematic
and
I'de rather avoid it.
Thanks,
Dennis
On 2012-02-16 13:05, Chris Vine wrote:
On Thu, 16 Feb 2012 11:04:19 +0100
Dénes Almási <denes rudanium org> wrote:
Hi! As one can see on wikipedia, C++11 offers the ability to create
utf-8 string literals.
(http://en.wikipedia.org/wiki/C%2B%2B11#New_string_literals [1] [1])
Is
it possible to pass these safely to Glib::ustring when constructing
them? It is suspicious that Glib::ustring::ustring(const char *
_src_,
size_type _n_ ) constructor will do the job. Is this right?
All string literals are null terminated after conversion to the
execution character set, so you pass it to the constructor taking a
const char*. The requirement of Glib::ustring is that this execution
character set must be UTF-8.
I have not done any playing about with C++11 string literals, but as
I
understand it you should be OK with a string literal with the 'u8'
prefix assuming the compiler is able to perform the conversion from
the
source character set, what your code editor spits out, to this
execution
character set (see §2.2/5: "Each source character set member in a
character literal or a string literal, as well as each escape
sequence
and universal-character-name in a character literal or a non-raw
string
literal, is converted to the corresponding member of the execution
character set"; and §2.4.15/7: "A string literal that begins with u8,
such as u8"asdf", is a UTF-8 string literal and is initialized with
the
given characters as encoded in UTF-8").
The key to this is my "assuming the compiler ..." above: the problem
is
that you have to let the compiler know what your source character set
is
in order for it to perform this conversion, and gcc will in the
absence
of the appropriate switch assume your source file is in your locale
encoding, which makes source files non-portable with non-ASCII string
literals unless you are importing whole unicode code points into your
u8 string (for which purpose the u8 prefix is genuinely useful).
For more on gcc, see
http://gcc.gnu.org/onlinedocs/cpp/Character-sets.html and also the
gcc
switch documentation:
"-finput-charset=charset: Set the input character set, used for
translation from the character set of the input file to the source
character set used by GCC. If the locale does not specify, or GCC
cannot get this information from the locale, the default is UTF-8.
This can be overridden by either the locale or this command line
option. Currently the command line option takes precedence if there's
a conflict. charset can be any encoding supported by the system's
iconv library routine."
If you use windows, I believe VS uses Windows ANSI as its default
source encoding, but you would need to look it up if that is your
platform.
This makes it almost always better to pass in your string literals
programmatically, say via gettext().
Chris
Links:
------
[1] http://en.wikipedia.org/wiki/C%2B%2B11#New_string_literals
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]