Re: G_UTF8String: Boxed Type Proposal



On Thu, Mar 17, 2016 at 4:09 PM, Jasper St. Pierre
<jstpierre mecheye net> wrote:
The major issue is that "Unicode character" doesn't have a good
definition. The most likely definition is a "Unicode code point",
however, Windows uses "Unicode character" to mean a UTF-16 byte
sequence, which means that any code point above the Basic Multilingual
Plane is really composed of two "Unicode characters", which are, of
course, surrogate pairs.

Terminology can certainly be confusing at times, but I think that a
Unicode character is a perfectly well-defined entity, non-withstanding
the fact that it can be represented in various encodings (a utf8
sequence, a ucs4 word, a utf-16 surrogate pair, etc).


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]