Re: Why do Glib::ustring::operator[] and at() return values, not references?



On Sat, 24 Jun 2017 19:48:07 +0100
Daniel Boles <dboles src gmail com> wrote:
On 24 June 2017 at 19:12, Chris Vine <vine35792468 gmail com> wrote:

On Sat, 24 Jun 2017 19:08:36 +0100
Chris Vine <vine35792468 gmail com> wrote:
 
It is because UTF-8 is a multibyte encoding, and any one
character may require between 1 and 5 bytes to represent it.  If
you were allowed to change a byte at will you would be able to
introduce invalid encoding sequences.  As to the absense of
documentation, maybe it is because this was thought to be
self-evident, dunno.  

And I should perhaps also make the point that these operators
return a 32-bit unicode character, not a byte, which is consequent
on the same point.  If you allowed mutation, the length of the
string (in bytes) might change.  

Right, of course. It does seem very obvious now. It seemed to
completely slip my mind that we're dealing with characters of
arbitrary width, not e.g. UTF-16. :( Thanks for the comprehensive
answer to a stupid question!

UTF-16 is also a variable width encoding, with surrogate pairs for
anything outside the basic multilingual plane.  Which is why UTF-16 is
regarded by many as a fairly unhelpful encoding.  It does have the
feature that for the average japanese text, it does occupy slightly
less space that UTF-8.  The same is not true of Chinese text though.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]