Re: G_UTF8String: Boxed Type Proposal
- From: Simon McVittie <simon mcvittie collabora co uk>
- To: gtk-devel-list gnome org
- Subject: Re: G_UTF8String: Boxed Type Proposal
- Date: Thu, 17 Mar 2016 20:48:34 +0000
On 17/03/16 20:29, Matthias Clasen wrote:
Terminology can certainly be confusing at times, but I think that a
Unicode character is a perfectly well-defined entity, non-withstanding
the fact that it can be represented in various encodings (a utf8
sequence, a ucs4 word, a utf-16 surrogate pair, etc).
You mean a code point, then (that's basically what gunichar is). I think
the reason Unicode people are so pedantic about "code point" is because
a code point may or may not be what you actually mean when you say
"character", whereas it's rare that I see "code point" used with a
meaning other than its Unicode one.
More precisely, a Unicode code point is an abstract entity indexed by a
number, such as U+0041 LATIN CAPITAL LETTER A or U+262D HAMMER AND
SICKLE, which can only be concretely represented as some particular byte
sequence by passing it through an encoding like UCS-4, UTF-8 or
ISO-8859-1. Some encodings are more obvious than others, and in
particular non-Unicode encodings like ISO-8859-1 cannot represent every
Unicode code point.
--
Simon McVittie
Collabora Ltd. <http://www.collabora.com/>
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]