Re: Faster UTF-8 decoding in GLib



Hi,

2010/3/16 Daniel Elstner <daniel kitta googlemail com>:
> Rock!
> /me jumps up and down, full of joy
>
> I'm a bit surprised that my version was minimally slower than mainline
> for the ASCII case.  Given that the GOT lookup is avoided, I would not
> have expected that.  Maybe it would be different on i386.
>
> If you have the time, it would also be very cool if we had a direct
> comparison between my implementation and yours, with both functions
> inline and both functions non-inline.

I could try that, after I take your one to good internal use where it
already shows more effect. But my current tests do not account for any
hidden costs of inlining longish and branched code.

> In any case, thanks a lot for taking the time to profile this.  I never
> got around to doing that myself.

Yes, I pretty much assumed that nobody will look at this stuff unless
there are some numbers to it.
Even besides any other changes, the performance test is a worthy thing to pick.

> Oh, one thing is important to note, if this is indeed considered for
> adoption in GLib:  The new implementation will behave differently for
> invalid input.  Not that it matters much, since in that case the result
> is undefined anyway.

I already made some minor changes to restrict what it produces (like,
c & 0x3f is safer than c - 0x80), and it should pass the test suite
which has a lot of cases for invalid input with some expected output.
My understanding is that unvalidated decoding should also accept
various software's misconstructions of UTF-8 and produce some
meaningful output.

I have some hypothetical explanation of the testing results
(Disclaimer: I'm far from a processor guru with a valid Lauterbach
license). The inlined table-dispatched call trades a PLT call for a
GOT data lookup plus a call to an address in register, which is
essentially the same thing. The fetch prediction actually works quite
all right for runs of text where all characters have same byte length;
this also explains less stellar results for German and Russian texts,
where ASCII and two-byte sequences are mixed. But the optimizer and
the CPU make a better job at loops and branches of more traditional
implementations when they have freedom to use them inline.

-- 
  Mikhail


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]