Re: Faster UTF-8 decoding in GLib



On 03/27/2010 04:27 PM, Daniel Elstner wrote:
> Hi,
> 
> Am Samstag, den 27.03.2010, 16:12 -0400 schrieb Behdad Esfahbod:
> 
>> Err, you're right.  My bad.  It's still broken though since it doesn't check
>> that the fragment bytes all start with the bits 10.  Missing error checking.

Looking at:
http://git.collabora.co.uk/?p=user/zabaluev/glib.git;a=commitdiff;h=9ace0f84dcbb7d95996c93c2236e0ec0253ee479

> It is not meant to check for errors.

Good point.

> I think it is totally arbitrary to handle some potential errors but not
> others.  And I think the current implementation does not do that check
> either -- it will behave differently, but it is still undefined.

The current implementation definitely does the check:

-  for ((Count) = 1; (Count) < (Len); ++(Count))
        \
-    {                                          \
-      if (((Chars)[(Count)] & 0xc0) != 0x80)   \
-       {                                       \
-         (Result) = -1;                        \
-         break;                                \
-       }                                       \
-      (Result) <<= 6;                          \
-      (Result) |= ((Chars)[(Count)] & 0x3f);   \
-    }

Anyway.  Nice construct :).  For future reference, it must be used with 32bit
ints only.  Otherwise it can go wrong.

behdad


> --Daniel
> 
> 
> 


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]