Re: [cairo] Initial cairo performance results from Nokia 770



On Tue, 10 Oct 2006 16:12:08 -0700, Carl Worth wrote:
> As before, I'll just attach them here, and follow up to add a bit of
> analysis.

First a quick scan of things that jump out from the results with the
image backend:

[ # ]  backend-content                     test-size   mean ms std dev. iterations
[ 32]    image-rgba       paint_linear_rgb_over-128    22.498  0.18%   100
[ 33]    image-rgba     paint_linear_rgb_source-128    18.625  0.45%   100
[ 34]    image-rgba      paint_linear_rgba_over-128    22.498  0.16%   100
[ 35]    image-rgba    paint_linear_rgba_source-128    18.620  0.46%   100
[ 36]    image-rgba       paint_radial_rgb_over-128   355.267  0.75%   100
[ 37]    image-rgba     paint_radial_rgb_source-128   351.519  0.75%   100
[ 38]    image-rgba      paint_radial_rgba_over-128   356.131  0.61%   100
[ 39]    image-rgba    paint_radial_rgba_source-128   352.099  0.78%   100

Here we see that radial gradients are 17 times slower than linear
gradients on this device. This is a big difference compared to the
results on my x86 laptop where radial gradients are only 2 times
slower than linear gradients.

So this is definitely a problem spot, and I'm looking forward to
watching how David Turner's gradient improvements help here.

Next, I want to analyze the performance of the fundamental paint
operation for both the image and the xlib backends. I'll use only the
512x512 pixel case since it should have the best numbers, (and the
smaller cases seems to show similar trends):

[ 60]    image-rgba        paint_solid_rgb_over-512     8.626  0.17%   100
[ 61]    image-rgba      paint_solid_rgb_source-512     8.625  0.18%   100
[ 62]    image-rgba       paint_solid_rgba_over-512    65.942  0.53%   100
[ 63]    image-rgba     paint_solid_rgba_source-512     8.634  0.17%   100
[ 64]    image-rgba        paint_image_rgb_over-512    17.172  0.18%   100
[ 65]    image-rgba      paint_image_rgb_source-512    17.162  0.19%   100
[ 66]    image-rgba       paint_image_rgba_over-512    77.566  0.47%   100
[ 67]    image-rgba     paint_image_rgba_source-512     9.414  0.23%   100

OK. There is some interesting data to be seen above.

First, let's assume that the 8-9 ms time represents a well-optimized
blit speed. So, in the case of a solid color, we're getting that good
speed in the 3 cases that are blits (rgb_over, rgb_source, and
rgba_source). And when the source pattern is an image instead of a
solid color we also get a good speed for the blit case (rgba_source).

Two other cases (rgb_over and rgb_source) are slightly harder than
blits since we have to expand the data to include a constant alpha
channel that does not exist in the source surface. These cases are 2x
slower than a blit. Is that expected? Or could we easily do better
than that?

Meanwhile, the slowest cases above are the two where we are actually
doing something "hard", (having to blend a source surface over a
destination where both have alpha). These are the solid_rgba_over and
image_rgb_over cases. Currently they are 8x slower than a blit. Is
that just what it costs to do the multiplication of the blend?
Maybe.

Let's next look at how the same cases change when there is no alpha
channel in the destination. I'll show only the rows that have
significantly different numbers than above:

[ 64]    image-rgb         paint_image_rgb_over-512     9.623  0.23%   100
[ 65]    image-rgb       paint_image_rgb_source-512     9.611  0.22%   100
[ 67]    image-rgb      paint_image_rgba_source-512    12.140  0.42%   100

The rgb_over and rgb_source cases have now changed from "copy and
augment with constant alpha" to "simple blit" and the numbers reflect
that. That's good.

The rgba_source case is a bit funny. We've got an alpha channel in the
source surface, but not in the destination. I'm not quite sure what
the semantics are of SOURCE in that case. Is it a simple blit still,
(just not caring what we put in the unused bits of the destination)?
If so, why is it 25% slower? If not, what extra work is it doing? It's
obviously not costing as much as the complementary case where a SOURCE
from rgb->rgba is 2x slower than a blit. So comparing the rgb->rgba
and rgba->rgb implementations might be useful.

OK, that's the image backend. Now let's look at these same cases with
the xlib backend:

[ 60]     xlib-rgba        paint_solid_rgb_over-512     9.672  0.47%   100
[ 61]     xlib-rgba      paint_solid_rgb_source-512     9.663  0.47%   100
[ 62]     xlib-rgba       paint_solid_rgba_over-512   436.860  0.45%   100
[ 63]     xlib-rgba     paint_solid_rgba_source-512     9.627  0.46%   100
[ 64]     xlib-rgba        paint_image_rgb_over-512   200.226  0.55%   100
[ 65]     xlib-rgba      paint_image_rgb_source-512   179.953  0.56%   100
[ 66]     xlib-rgba       paint_image_rgba_over-512   142.724  0.56%   100
[ 67]     xlib-rgba     paint_image_rgba_source-512    62.047  0.47%   100

Here we can see some of the same patterns as in the image case. Solid
color blits are all acting well. But here, the image_rgba_source which
was a fast blit above now has a 5x performance hit. So there's a
definite performance bug there.

Also, the image_rgb_over and image_rgb_source cases which are "blit
and set alpha channel to constant" are 18-20 times slower than blits,
(compared to 2x slower with the image backend), so that looks like
another performance bug.

Finally, the solid_rgba_over case is horrible. With the image backend
it was 8x slower than a blit, here it is 45x slower!  Remarkably, this
solid-color case is 3x slower than the image_rgba_over
case. Something's really broken when it takes cairo 3 times longer to
render a solid color than a complete image. Meanwhile that image
blending itself is almost 15x slower than a blit, (compared to the
image backend where there was only a 8x slower).

In addition, with the xlib backend, we can look at what happens when
the source pattern is an xlib surface rather than an image surface:

[ 68]     xlib-rgba      paint_similar_rgb_over-512   148.130  0.56%   100
[ 69]     xlib-rgba    paint_similar_rgb_source-512   127.762  0.29%   100
[ 70]     xlib-rgba     paint_similar_rgba_over-512    91.166  0.40%   100
[ 71]     xlib-rgba   paint_similar_rgba_source-512    10.681  0.44%   100

There's one very encouraging point here, namely that the rgba_source
time is down close to what we expect for a blit, (just about 10%
slower than sold_rgba_source). So it looks like we've got at least one
thing right in the xlib backend!

The other cases here are also faster than the corresponding
image-surface source pattern cases with the xlib backend, but not as
significantly. The rgb_over and rgb_source "blit and set alpha channel
to constant" cases are 13-15x slower than a blit (compared to 18-20x
with image surfaces sources with the xlib backend), but still not
comparing favorably with the image backend where these cases are only
2x slower than a blit.

Finally, the "hard" case of actually blending one surface over another
(rgb_over) is here about 9x slower than a blit (rgba_source). That
does compare quite favorably with the 8x we saw in the image backend.

So there are definitely some performance bugs in the xlib backend. It
will probably take a combination of fixes in both cairo and the X
server to address all of these. Some of the cairo fixes should be
really easy, (things like replacing OVER with SOURCE if the source
pattern has no alpha channel). Almost any clearly identified
performance bug in the above can be fixed by appropriately calling
code that already exists, so that's encouraging.

Finally, let's look at what happens when the xlib destination surface
does not have an alpha channel:

[ 60]     xlib-rgb         paint_solid_rgb_over-512     5.208  0.86%   100
[ 61]     xlib-rgb       paint_solid_rgb_source-512     5.204  0.89%   100
[ 62]     xlib-rgb        paint_solid_rgba_over-512   537.259  0.53%   100
[ 63]     xlib-rgb      paint_solid_rgba_source-512     5.122  0.77%   100
[ 64]     xlib-rgb         paint_image_rgb_over-512   218.250  0.14%   100
[ 65]     xlib-rgb       paint_image_rgb_source-512   199.015  0.14%   100
[ 66]     xlib-rgb        paint_image_rgba_over-512   176.539  0.13%   100
[ 67]     xlib-rgb      paint_image_rgba_source-512   197.005  0.12%   100
[ 68]     xlib-rgb       paint_similar_rgb_over-512     5.917  0.74%   100
[ 69]     xlib-rgb     paint_similar_rgb_source-512     5.918  0.72%   100
[ 70]     xlib-rgb      paint_similar_rgba_over-512   125.132  0.10%   100
[ 71]     xlib-rgb    paint_similar_rgba_source-512   145.492  0.10%   100

Interestingly, all of the blit speeds got nearly twice as fast.
Perhaps someone more familiar with the details of this X server could
easily explain why that is.

The remainder of the tests seemed to follow a pattern similar to the
image backend.

Wow, that was a lot of prose and a lot of numbers. I don't know if
anyone is really going to absorb all that. It probably would have been
better for me to rewrite this by grouping the operations that should
have similar performance characteristics. That would have made the
problematic cases stand out much better. But it's late, and I'd rather
just send this out now that I've typed it all up.

Looking forward to lots of good improvements...

-Carl

Attachment: pgpk675kctW9Y.pgp
Description: PGP signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]