Re: Faster UTF-8 decoding in GLib


Am Dienstag, den 16.03.2010, 17:20 +0200 schrieb Mikhail Zabaluev:

> I've made a glib branch where I tried to optimize the UTF-8 decoding routines:
> The new code uses a table of unrolled functions to decode byte
> sequences, dispatched by the first character. g_utf8_get_char() got an
> inlined implementation.

Ouch.  I'm not sure that's such a great idea -- indirect calls usually
completely kill any branch prediction.  I would advise to test on
different CPU types.  Also, table lookups have their downside -- more
cache pressure, GOT needs to be fetched etc.

If you are interested, you may check out the glibmm solution, which gets
away without using any tables whatsoever:

I'd love to have numbers on how my implementation competes with
table-based solutions.  In particular, if you inline the function.  So,
if you have the time... :-)


