Re: Faster UTF-8 decoding in GLib



Hi,

2010/3/16 Daniel Elstner <daniel kitta googlemail com>:
>
> Addendum: It's actually not longish at all, even though it may look like
> that in the C code.  There are exactly two branches.  I bet that many
> macros in GTK+ expand to more than that.

OK, here you go:
http://git.collabora.co.uk/?p=user/zabaluev/glib.git;a=shortlog;h=refs/heads/fast-utf8-elstner

In addition to applying your code in existing functions where
difference was felt, and some more opportunistic tweaks, this
introduces two new functions, g_utf8_iterate() and
g_utf8_iterate_back(), which are inlined.

Performance results for Intel Core 2 follow.

The mainline, tested with branch utf8-perftest:

GTest: run: /utf8/perf/get_char
(MAXPERF:ASCII:     164.1 MB/s)
(MAXPERF:Latin-1:   162.8 MB/s)
(MAXPERF:Cyrillic:  200.4 MB/s)
(MAXPERF:Chinese:   234.2 MB/s)
GTest: result: OK
GTest: run: /utf8/perf/get_char-backwards
(MAXPERF:ASCII:     146.2 MB/s)
(MAXPERF:Latin-1:   136.3 MB/s)
(MAXPERF:Cyrillic:  142.7 MB/s)
(MAXPERF:Chinese:   181.0 MB/s)
GTest: result: OK
GTest: run: /utf8/perf/get_char_validated
(MAXPERF:ASCII:     130.5 MB/s)
(MAXPERF:Latin-1:   121.1 MB/s)
(MAXPERF:Cyrillic:  141.7 MB/s)
(MAXPERF:Chinese:   195.1 MB/s)
GTest: result: OK
GTest: run: /utf8/perf/utf8_to_ucs4
(MAXPERF:ASCII:     107.5 MB/s)
(MAXPERF:Latin-1:    95.8 MB/s)
(MAXPERF:Cyrillic:  127.4 MB/s)
(MAXPERF:Chinese:   148.4 MB/s)
GTest: result: OK
GTest: run: /utf8/perf/utf8_to_ucs4_fast
(MAXPERF:ASCII:     125.7 MB/s)
(MAXPERF:Latin-1:   122.1 MB/s)
(MAXPERF:Cyrillic:  173.1 MB/s)
(MAXPERF:Chinese:   300.9 MB/s)
GTest: result: OK

The top of fast-utf8-elstner:

GTest: run: /utf8/perf/iterate
(MAXPERF:ASCII:     570.1 MB/s)
(MAXPERF:Latin-1:   449.5 MB/s)
(MAXPERF:Cyrillic:  395.9 MB/s)
(MAXPERF:Chinese:   561.3 MB/s)
GTest: result: OK
GTest: run: /utf8/perf/iterate_back
(MAXPERF:ASCII:     384.6 MB/s)
(MAXPERF:Latin-1:   364.9 MB/s)
(MAXPERF:Cyrillic:  432.1 MB/s)
(MAXPERF:Chinese:   451.5 MB/s)
GTest: result: OK
GTest: run: /utf8/perf/get_char
(MAXPERF:ASCII:     186.0 MB/s)
(MAXPERF:Latin-1:   171.4 MB/s)
(MAXPERF:Cyrillic:  248.5 MB/s)
(MAXPERF:Chinese:   398.6 MB/s)
GTest: result: OK
GTest: run: /utf8/perf/get_char-backwards
(MAXPERF:ASCII:     138.2 MB/s)
(MAXPERF:Latin-1:   135.3 MB/s)
(MAXPERF:Cyrillic:  173.3 MB/s)
(MAXPERF:Chinese:   264.9 MB/s)
GTest: result: OK
GTest: run: /utf8/perf/get_char_validated
(MAXPERF:ASCII:     128.7 MB/s)
(MAXPERF:Latin-1:   119.3 MB/s)
(MAXPERF:Cyrillic:  143.6 MB/s)
(MAXPERF:Chinese:   210.7 MB/s)
GTest: result: OK
GTest: run: /utf8/perf/utf8_to_ucs4
(MAXPERF:ASCII:      62.7 MB/s)
(MAXPERF:Latin-1:    71.5 MB/s)
(MAXPERF:Cyrillic:  109.7 MB/s)
(MAXPERF:Chinese:   156.8 MB/s)
GTest: result: OK
GTest: run: /utf8/perf/utf8_to_ucs4_fast
(MAXPERF:ASCII:     153.4 MB/s)
(MAXPERF:Latin-1:   149.2 MB/s)
(MAXPERF:Cyrillic:  244.9 MB/s)
(MAXPERF:Chinese:   352.6 MB/s)
GTest: result: OK

Note the bad results for utf8_to_ucs4, which are caused by the wrong
pattern in which G_IMPLEMENT_INLINES is used in glib, and which I
reproduced in this new code. It makes the non-inlined extern versions
of the functions get used in the source file that's made responsible
for emitting them for the non-inline API. A proper implementation
would be to create a dedicated source file to collect all non-inlined
emissions throughout glib. But that will wait for another branch.
Without this wart, the performance is better:

GTest: run: /utf8/perf/utf8_to_ucs4
(MAXPERF:ASCII:     128.7 MB/s)
(MAXPERF:Latin-1:   120.8 MB/s)
(MAXPERF:Cyrillic:  151.9 MB/s)
(MAXPERF:Chinese:   211.1 MB/s)
GTest: result: OK

Enjoy,
  Mikhail


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]