Pixel formats and blitting performance



Hello!

On 10 Oct 2007 21:46:38 +0200, Soeren Sandmann <sandmann daimi au dk> wrote:
> "BJörn Lindqvist" <bjourne gmail com> writes:
>
> > Incidentally, blitting pixbufs is slower than it has to be because its
> > format rarely matches the X11 server which uses either xRGB32 or
> > ARGB32.
>
> I don't disagree with anything else you say, but this "performance
> issue" is really a non-issue. We are talking about data on the client
> here, so there is no way the X server can hardware accelerate those
> blits [1]. This means we are back to software in both cases, and in
> that case the difference between convert_and_copy_data() and
> copy_data() is completely negligble.

I'm not sure about that, but what I am (almost) sure about is that
blitting pixbufs is slow. :) Here is a small test program in SDL
(exhibit A):


----------------------------------------------------------------------
#include <SDL/SDL.h>
#include <sys/time.h>
#include <stdio.h>

#define N_LOOPS 1000
#define SRC_DEPTH   32
#define DST_DEPTH   32

int main (int argc, char *argv[])
{
    SDL_Init (SDL_INIT_EVERYTHING);
    SDL_Surface *screen =
        SDL_SetVideoMode (500, 500, DST_DEPTH, SDL_SWSURFACE);
    SDL_Surface *surface =
        SDL_CreateRGBSurface (SDL_SWSURFACE,
                              500, 500, SRC_DEPTH,
                              screen->format->Rmask,
                              screen->format->Gmask,
                              screen->format->Bmask,
                              0);
    SDL_FillRect (surface, NULL, 0xffff0000);

    struct timeval tv_start, tv_stop;
    gettimeofday (&tv_start, NULL);
    for (int n = 0; n < N_LOOPS; n++)
    {
        SDL_BlitSurface (surface, NULL, screen, NULL);
        SDL_UpdateRect (screen, 0, 0, 0, 0);
    }
    gettimeofday (&tv_stop, NULL);
    double elapsed = (tv_stop.tv_sec - tv_start.tv_sec) * 1000000.0 +
        (tv_stop.tv_usec - tv_start.tv_usec);

    double ms_tot = elapsed / 1000.0;
    printf ("total %.2f ms, per loop %.2f\n", ms_tot, ms_tot / N_LOOPS);
}
----------------------------------------------------------------------

On X11, this just copies a source surface to an XWindow using
XShmPutImage. And here is the same thing using GdkPixbufs (exhibit B):

----------------------------------------------------------------------
#include <gtk/gtk.h>
#include <stdlib.h>
#include <sys/time.h>

#define N_LOOPS 1000

int main (int argc, char *argv[])
{
    gtk_init (&argc, &argv);

    GtkWidget *win = gtk_window_new (GTK_WINDOW_TOPLEVEL);
    gtk_window_set_default_size (GTK_WINDOW (win), 500, 500);
    gtk_widget_show (win);

    GdkPixbuf *src = gdk_pixbuf_new (GDK_COLORSPACE_RGB, TRUE, 8, 500, 500);
    gdk_pixbuf_fill (src, 0xff0000ff);

    struct timeval start, stop;

    gettimeofday (&start, NULL);
    for (int n = 0; n < N_LOOPS; n++)
    {
        gdk_draw_pixbuf (win->window, NULL, src,
                         0, 0,
                         0, 0,
                         -1, -1,
                         GDK_RGB_DITHER_NONE, 0, 0);
    }
    gettimeofday (&stop, NULL);
    double elapsed = (stop.tv_sec - start.tv_sec) * 1000000.0 +
        (stop.tv_usec - start.tv_usec);

    double ms_tot = elapsed / 1000.0;
    printf ("total %.2f ms, per loop %.2f\n", ms_tot, ms_tot / N_LOOPS);
}
----------------------------------------------------------------------

AFAIK, SDL performance is pretty damn good and probably as fast as you
can get on Linux. By changing SRC_DEPTH and DST_DEPTH in exhibit A, I
get different timings (best result for each):

    SRC_DEPTH = 24, DST_DEPTH = 24 = 6.59 ms/loop
    SRC_DEPTH = 32, DST_DEPTH = 24 = 6.55 ms/loop
    SRC_DEPTH = 24, DST_DEPTH = 32 = 5.89 ms/loop
    SRC_DEPTH = 32, DST_DEPTH = 32 = 1.84 ms/loop

I have a very fast computer and graphics card (NV43 GeForce 6200) so
these timings may not be representative for everyone. But at least in
this case I think it is clear that it pays off *alot* to use the
correct pixel format.

Here are the results for the GDK test (pixbuf with and without alpha
and best results):

    ALPHA    = 8.07 ms/loop
    NO ALPHA = 2.10 ms/loop

Unless my tests are really flawed (they very well might - I'm not an
expert), it seems to me that avoiding pixel format conversions is
extremely important. Both GdkPixbuf and SDL uses XShmPutImage to draw
with and you can test the performance of that function alone:

    $ x11perf --shmput500
    ...
    20000 trep @   1.2978 msec (   771.0/sec): ShmPutImage 500x500 square

So SDL adds 1.84 - 1.30 = 0.54 ms overhead and GDK 2.10 - 1.30 = 0.80
ms. That could be improved quite a bit I think.


-- 
mvh Björn



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]