Faster UTF-8 decoding in GLib



Hello,

I've made a glib branch where I tried to optimize the UTF-8 decoding routines:
http://git.collabora.co.uk/?p=user/zabaluev/glib.git;a=shortlog;h=refs/heads/fast-utf8

The new code uses a table of unrolled functions to decode byte
sequences, dispatched by the first character. g_utf8_get_char() got an
inlined implementation.

I have added a performance test that can also be used against mainline.
Some performance observations with x86, the code compiled by gcc 4.4.1
with optimization flags -O3 -march=core2 and ran on a ThinkPad T61p:

- There is a 15-50% speedup on g_utf8_get_char(), depending on the text.
- g_utf8_to_ucs4_fast() got a similar boost for ASCII, but curiously,
performance has degraded for Chinese text.
- g_utf8_get_char_extended() and g_utf8_get_char_validated()
surprisingly perform better in the present branchy implementation,
compared to my attempted reimplementation using the function tables. I
left them untouched.

I can get measurements on ARM Cortex A9 with a Nokia N900, if there is
enough interest.

Feel free to play and improve.
  Mikhail


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]