Re: Trivial patch reducing fp mults in pango-cairo
- From: Behdad Esfahbod <behdad behdad org>
- To: Daniel Amelang <daniel amelang gmail com>
- Cc: performance-list gnome org
- Subject: Re: Trivial patch reducing fp mults in pango-cairo
- Date: Wed, 13 Dec 2006 17:06:16 -0500
On Mon, 2006-12-11 at 12:56 -0800, Daniel Amelang wrote:
> Assuming we're only talking about pangocairo for the moment, no. Since
> Behdad made glyph_extents pretty much go away entirely, now there is
> only one place in pangocairo that is responsible for FP burn (in all
> the test cases that I've seen): the loop in
> pango_cairo_renderer_draw_glyphs. For each glyph, this statement is
> executed (for both x and y):
>
> double cx = crenderer->x_offset +
> (double)(x + x_position + gi->geometry.x_offset) / PANGO_SCALE;
>
> Which produces a int->double conversion (slow), a double multiply [1]
> (very slow) and a double add (very slow). Twice for each glyph on
> every expose. The resulting double is later used to populate the x and
> y members (doubles also) of the cairo_glyph_t that is sent to
> cairo_show_glyphs.
>
> I really think that the above line(s) is responsible for pretty much
> all the __muldf3, __adddf3 and __floatsidf we see here (at the top of
> the profile):
>
> http://folks.o-hand.com/~jorn/pango-benchmarks/28-pango-1.15.1/pango-cairo.txt
>
> I have an idea of how to get rid of these FP ops, too, but I've been
> concentrating on cairo at the moment and haven't gotten around to
> coding anything up yet. I'll outline the idea here in case someone
> wants to beat me to it (and so I can refer back to it later when I
> forget :)
>
> First, we can convert the crenderer offsets to fixed point before we
> enter the loop. This will allow us to eliminate the __adddf3, as the
> result of the (x + x_position + gi->geometry.x_offset) expression is a
> fixed point number anyway (right, Behdad?), so we can just change the
> expression to (x + x_position + gi->geometry.x_offset +
> crenderer_x_offset_fixed).
Yeah kinda. Instead of changing the crenderer offsets, we can convert
them to fixed out of the loop. Pango uses 26.6, so it shouldn't be that
bad. But theoretically speaking, this is going to degrade the valid
range. So, maybe I'll test and if the optimization works, use a tighter
loop or something.
Another free optimization is to not recompute cy if
gi->geometry.y_offset is zero.
> What's left is the conversion from fixed->double, which can be done
> w/out the __mul or the __floatsidf. Basically, the number of leading
> zeros in the fixed point number can be used to determine the exponent
> value of the target double, and since the number is in fixed point,
> you'll need to use a bias that is adjusted for the size of the
> fractional part of the fixed point number. After shifting the fixed
> point the proper amount (based on the number of leading zeros again),
> you'll have your exponent and mantissa all set to pack into a union.
> Copy the double from the union into the cairo_glyph coordinate, and
> you're done. Need to watch out for some special cases, but I think
> that the approach is sound.
Well, this is kinda hitting the limit. You are basically rewriting soft
float routines. First, I'm not sure it's much faster (ok, you can skip
some details, so it's got to be faster), second, you are mostly shifting
time from __mul to library functions. I'll rather leave these to the
compiler. Has anyone tested compiling recent pango+cairo with
softfloats on small systems?
> Once that is done, pangocairo should be pretty much FP free for the
> typical code paths that I would expect to see on the 770. On
> timetext.c or the torturer's GtkTextView, I don't think you'll see
> _that_ much improvement (percentage-wise) from this change until you
> get Xan's XRender glyph optimization into cairo, as that is a bigger
> bottleneck ATM, I think.
Yeah, if you compare the overall profiles with pangocairo ones,
pangocairo is taking like less than 5% of the time (possibly much less).
Nothing to be gained here.
> [1] Because the denominator of the divide is a constant, the compiler
> converts it into a multiply, which is faster.
>
> Dan
--
behdad
http://behdad.org/
"Those who would give up Essential Liberty to purchase a little
Temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin, 1759
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]