Re: Gtk performance issues from a user's point of view

From: Carl Worth <cworth cworth org>
To: zuh iki fi
Cc: cairo cairographics org, Performance-list gnome org
Subject: Re: Gtk performance issues from a user's point of view
Date: Mon, 09 Oct 2006 10:48:00 -0700

On Mon, 9 Oct 2006 07:56:27 +0300, "Kalle Vahlman" wrote:
>
> gtk_marshal_BOOLEAN__BOXED         0,03  46,28
>   gtk_button_expose                 0,03  24,88
>   scw_view_expose                   0,00   8,96
> ...
...
> The new tesselator was supposed to be up to four times faster than the
> old one, but running the same test with the old one yields a different
> result:
>
> _gtk_marshal_BOOLEAN__BOXED         0,05  33,43
>   gtk_button_expose                 0,00  10,91
>   scw_view_expose                   0,02  10,13
>   gtk_label_expose                  0,00   3,54
>   meta_frames_expose_event          0,00   3,16

We've only got relative percentages to work with, which isn't a lot
here, but let's try. If we assume that the scw_view_expose time is
independent of the tessellator, (which is perhaps unlikely), then we
can say that gtk_button_expose is about 2.8 times slower than it with
the new tessellator and roughly the same speed with the old
tessellator.

So there is probably a tessellator slowdown here. Is this on a no-FPU
machine? The 4x improvement I saw, (which was really just one number
from one test case---not necessarily representative, etc. etc.), was
on an x86 laptop.

If the algorithmic improvements are all correct then the new
tessellator should be doing fewer intersection computations than the
old one. It might actually be interesting for you to throw a counter
into _line_segs_intersect_ceil (for old---and careful to not
instrument the first instance which is not compiled dues to a "#if
0"), and _cairo_bo_edge_intersect (for the new tessellator). That
would let us see if the algorithmic stuff is working correctly.

Meanwhile, there are non-trivial changes to the per-intersection time
in the two implementations. The old one uses floating-point arithmetic
while the new one is currently using 128-bit fixed-point arithmetic.
One thing I'm going to be doing this week is implementing 64-bit
fixed-point arithmetic when possible, and also doing some timing on a
Nokia 770.

Meanwhile, I'd still be interested in a higher-level view of what's
happening here. Why is the tessellator such an significant aspect of
drawing a button? What kinds of cairo operations is ClearLooks doing
here?

On that topic, here is a cairo performance bug that Benjamin
identified for me this week (other theme authors reported having the
same problem):

 * Using cairo_stroke to draw a single-pixel rectangle is much slower
   than using cairo_fill to draw exactly the same thing.

Here is example code for drawing the rectangle:

Slow:
	cairo_rectangle (cr, 0.5, 0.5, w - 1, h - 1);
	cairo_set_line_width (cr, 1.0);
	cairo_stroke (cr);

Fast:
	cairo_rectangle (cr, 0, 0, w, h);
	cairo_rectangle (cr, 1, 1, w - 2, h - 2);
	cairo_set_fill_rule (cr, CAIRO_FILL_RULE_EVEN_ODD);
	cairo_fill (cr);

So that's something we definitely need to fix, particularly since this
operation is much more intuitively performed with stroke() instead of
fill(). It also affects things like L-shaped stroked paths for doing
3D bevelled edges. And it would be a shame for theme authors to
have to ignore cairo_stroke() for performance reasons and manually
compute a path around the stroked outline for things like this.

I'm pretty sure I understand what's happening in a case like
this. When using fill(), cairo tessellates and then successfully
extracts a pixel-based region from the resulting trapezoids. This lets
cairo use a very fast path for drawing.

I believe that in the case of stroke() we are missing the fast path
because there are some non-pixel-aligned trapezoids in the resulting
set, (think four small squares used for the miter joins in the
corners---the interior edges of each are on the half-integer
coordinate). So even though the resulting region is identical, our
region-extractor bails out when it finds a single non-pixel-aligned
trapezoid.

So one fix for this problem would be to change the implementation of
stroke() to first compute a new path representing the outline of the
stroke, (something like the stroke_to_path() operation we've proposed)
and then handing that path over to the existing fill() machinery. This
is something I've been wanting to do for a while anyway, (it will fix
an old self-intersecting bug we have had identified with a filing test
case for a long time). But I've been waiting until we had a faster
tessellator before we started relying on it for stroke as well as
fill, (thinking, mistakenly, that stroke would generally be faster
than fill since it avoids the tessellator).

Another thing we identified at GUADEC is that the dashed stroke for
the focus rectangle that GTK+ uses is also a performance problem. This
is another case that we should ensure is going through the fast path
for whole-pixel regions.

Anyway, there's lots of fun stuff to do here, and I hope we can get
lots of people helping out with this stuff inside of cairo.

-Carl

Attachment: pgpSO1v92f1Qx.pgp
Description: PGP signature

Follow-Ups:
- Re: Gtk performance issues from a user's point of view
  - From: Kalle Vahlman

References:
- Gtk performance issues from a user's point of view
  - From: Kalle Vahlman

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]