Re: Faster UTF-8 decoding in GLib


Am Samstag, den 27.03.2010, 16:51 -0400 schrieb Behdad Esfahbod:
> On 03/27/2010 04:27 PM, Daniel Elstner wrote:
> >
> > It is not meant to check for errors.
> Good point.
> > I think it is totally arbitrary to handle some potential errors but not
> > others.  And I think the current implementation does not do that check
> > either -- it will behave differently, but it is still undefined.
> The current implementation definitely does the check:

OK, looks like I misremembered.  My bad.  However, it is not documented
as such:

 * g_utf8_get_char:
 * @p: a pointer to Unicode character encoded as UTF-8
 * Converts a sequence of bytes encoded as UTF-8 to a Unicode character.
 * If @p does not point to a valid UTF-8 encoded character, results are
 * undefined. If you are not sure that the bytes are complete
 * valid Unicode characters, you should use g_utf8_get_char_validated()
 * instead.
 * Return value: the resulting character

> Anyway.  Nice construct :).  For future reference, it must be used with 32bit
> ints only.  Otherwise it can go wrong.

Thanks! :-)

Well, I assume that ints are at least 32 bit wide on any platform
supported by GLib.  But if you meant to say that it would break with
larger ints, I don't see why.  As long as the type is unsigned, it
should be fine.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]