On Mon, 9 Oct 2006 07:56:27 +0300, "Kalle Vahlman" wrote: > > gtk_marshal_BOOLEAN__BOXED 0,03 46,28 > gtk_button_expose 0,03 24,88 > scw_view_expose 0,00 8,96 > ... ... > The new tesselator was supposed to be up to four times faster than the > old one, but running the same test with the old one yields a different > result: > > _gtk_marshal_BOOLEAN__BOXED 0,05 33,43 > gtk_button_expose 0,00 10,91 > scw_view_expose 0,02 10,13 > gtk_label_expose 0,00 3,54 > meta_frames_expose_event 0,00 3,16 We've only got relative percentages to work with, which isn't a lot here, but let's try. If we assume that the scw_view_expose time is independent of the tessellator, (which is perhaps unlikely), then we can say that gtk_button_expose is about 2.8 times slower than it with the new tessellator and roughly the same speed with the old tessellator. So there is probably a tessellator slowdown here. Is this on a no-FPU machine? The 4x improvement I saw, (which was really just one number from one test case---not necessarily representative, etc. etc.), was on an x86 laptop. If the algorithmic improvements are all correct then the new tessellator should be doing fewer intersection computations than the old one. It might actually be interesting for you to throw a counter into _line_segs_intersect_ceil (for old---and careful to not instrument the first instance which is not compiled dues to a "#if 0"), and _cairo_bo_edge_intersect (for the new tessellator). That would let us see if the algorithmic stuff is working correctly. Meanwhile, there are non-trivial changes to the per-intersection time in the two implementations. The old one uses floating-point arithmetic while the new one is currently using 128-bit fixed-point arithmetic. One thing I'm going to be doing this week is implementing 64-bit fixed-point arithmetic when possible, and also doing some timing on a Nokia 770. Meanwhile, I'd still be interested in a higher-level view of what's happening here. Why is the tessellator such an significant aspect of drawing a button? What kinds of cairo operations is ClearLooks doing here? On that topic, here is a cairo performance bug that Benjamin identified for me this week (other theme authors reported having the same problem): * Using cairo_stroke to draw a single-pixel rectangle is much slower than using cairo_fill to draw exactly the same thing. Here is example code for drawing the rectangle: Slow: cairo_rectangle (cr, 0.5, 0.5, w - 1, h - 1); cairo_set_line_width (cr, 1.0); cairo_stroke (cr); Fast: cairo_rectangle (cr, 0, 0, w, h); cairo_rectangle (cr, 1, 1, w - 2, h - 2); cairo_set_fill_rule (cr, CAIRO_FILL_RULE_EVEN_ODD); cairo_fill (cr); So that's something we definitely need to fix, particularly since this operation is much more intuitively performed with stroke() instead of fill(). It also affects things like L-shaped stroked paths for doing 3D bevelled edges. And it would be a shame for theme authors to have to ignore cairo_stroke() for performance reasons and manually compute a path around the stroked outline for things like this. I'm pretty sure I understand what's happening in a case like this. When using fill(), cairo tessellates and then successfully extracts a pixel-based region from the resulting trapezoids. This lets cairo use a very fast path for drawing. I believe that in the case of stroke() we are missing the fast path because there are some non-pixel-aligned trapezoids in the resulting set, (think four small squares used for the miter joins in the corners---the interior edges of each are on the half-integer coordinate). So even though the resulting region is identical, our region-extractor bails out when it finds a single non-pixel-aligned trapezoid. So one fix for this problem would be to change the implementation of stroke() to first compute a new path representing the outline of the stroke, (something like the stroke_to_path() operation we've proposed) and then handing that path over to the existing fill() machinery. This is something I've been wanting to do for a while anyway, (it will fix an old self-intersecting bug we have had identified with a filing test case for a long time). But I've been waiting until we had a faster tessellator before we started relying on it for stroke as well as fill, (thinking, mistakenly, that stroke would generally be faster than fill since it avoids the tessellator). Another thing we identified at GUADEC is that the dashed stroke for the focus rectangle that GTK+ uses is also a performance problem. This is another case that we should ensure is going through the fast path for whole-pixel regions. Anyway, there's lots of fun stuff to do here, and I hope we can get lots of people helping out with this stuff inside of cairo. -Carl
Attachment:
pgpSO1v92f1Qx.pgp
Description: PGP signature