Re: Trivial patch reducing fp mults in pango-cairo



On 12/11/06, Eero Tamminen <ext-eero tamminen nokia com> wrote:
Hi,

ext Jorn Baayen wrote:
> Argh! It would have been too simple to be true. The difference in
> profiles must be due to general profiling fuzziness.

Is there anymore anything where both what is being calculated and
the result are in integers but they are casted to and multiplied with
floating point values for additional accuracy?

Assuming we're only talking about pangocairo for the moment, no. Since
Behdad made glyph_extents pretty much go away entirely, now there is
only one place in pangocairo that is responsible for FP burn (in all
the test cases that I've seen): the loop in
pango_cairo_renderer_draw_glyphs. For each glyph, this statement is
executed (for both x and y):

double cx = crenderer->x_offset +
 (double)(x + x_position + gi->geometry.x_offset) / PANGO_SCALE;

Which produces a int->double conversion (slow), a double multiply [1]
(very slow) and a double add (very slow). Twice for each glyph on
every expose. The resulting double is later used to populate the x and
y members (doubles also) of the cairo_glyph_t that is sent to
cairo_show_glyphs.

I really think that the above line(s) is responsible for pretty much
all the __muldf3, __adddf3 and __floatsidf we see here (at the top of
the profile):

http://folks.o-hand.com/~jorn/pango-benchmarks/28-pango-1.15.1/pango-cairo.txt

I have an idea of how to get rid of these FP ops, too, but I've been
concentrating on cairo at the moment and haven't gotten around to
coding anything up yet. I'll outline the idea here in case someone
wants to beat me to it (and so I can refer back to it later when I
forget :)

First, we can convert the crenderer offsets to fixed point before we
enter the loop. This will allow us to eliminate the __adddf3, as the
result of the (x + x_position + gi->geometry.x_offset) expression is a
fixed point number anyway (right, Behdad?), so we can just change the
expression to (x + x_position + gi->geometry.x_offset +
crenderer_x_offset_fixed).

What's left is the conversion from fixed->double, which can be done
w/out the __mul or the __floatsidf. Basically, the number of leading
zeros in the fixed point number can be used to determine the exponent
value of the target double, and since the number is in fixed point,
you'll need to use a bias that is adjusted for the size of the
fractional part of the fixed point number. After shifting the fixed
point the proper amount (based on the number of leading zeros again),
you'll have your exponent and mantissa all set to pack into a union.
Copy the double from the union into the cairo_glyph coordinate, and
you're done. Need to watch out for some special cases, but I think
that the approach is sound.

Once that is done, pangocairo should be pretty much FP free for the
typical code paths that I would expect to see on the 770. On
timetext.c or the torturer's GtkTextView, I don't think you'll see
_that_ much improvement (percentage-wise) from this change until you
get Xan's XRender glyph optimization into cairo, as that is a bigger
bottleneck ATM, I think.

[1] Because the denominator of the divide is a constant, the compiler
converts it into a multiply, which is faster.

Dan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]