Re: Some performance notes



At 20:20 05.08.01 -0400, Owen Taylor wrote:
>
>Here's some notes about current performance. The benchmarking 
>I did to get these conclusions was a combination of:
>
> - control-c profiling of opaque resizing (run under gdb,
>   other window 'sleep 5 && killall -TRAP lt-testgtk, see
>   where you are.)
>
> - Running Hans's testgtk benchmark with some different 
>   compilation options.
>
> - Other debugging printf tets - for instance, printing out the
>   results of timing each call to gtk_container_idle_sizer()
>
>So, without including hard data (I don't have much), 
How could you do that ? I remember someone telling me :-)

 "How are you measuring performance? "20 times slower" is not really
  a useful statement. " 

So I have attached some (maybe partly unfair) benchmarking results.
There are three output files of:

testgtk --bench=all:10 > bench-release.log
testgtk --bench=all:10 --gtk-unbuffered > bench-release-unbuffered.log

(full compiler optimization, absolutely no G_DEBUG support. The
 second test run uses a patched gtk. Patch send to the list some while ago)

testgtk --bench=all:10 > bench-gtk-win32-prod.log
(this is the unfair one. Handle with care, because some widgets have 
 changed. It is done with a patched version of testgtk from the
 gtk-1-3-win32-production branch, before Pango, double-buffering, gobject)

Overall results: the latest greatest version has at least the ratio
of it's version number step (assuming 1.3 and 2.0) lack of performance,
even if running with "--gtk-unbuffered". 
Look at the timing of deprecated widgets like 'clist', 'ctree' but
also 'panes', 'progress bar', etc.

Using double-buffering adds 20-30% to the execution time on windoze too.

The magical "20 times slower" is almost visible if you compare the
previous 'label' with the current one. Obviously the new one has to render
much more, so it is a really unfair comparsion. But still I think it
would be nice to get the factor five optimization for the 'auto-wrap 
feature' by adding a new function pango_paragraph_balance () or the
like. (Search for 'unbalanced' in gtklabel.c to find the unoptimized code)

>here are my observations:
>
> [...]
>
> * With debugging turned off, the bulk of time was spent in
>   the signal emission code and its GValue handling (40-50%)
>
This may be most of the issue shown above, because the 
gtk-1-3-win32-production branch is still using the gtk object / signal
system.

> * ~10% of the non-debug time was spent copying GdkEvent structures.
>   This is easily fixed by adding G_SIGNAL_STATIC_SCOPE to
>   the GdkEvent signals. (this was part of the signal overhead
>   mentioned above)
>
This is adding '| G_SIGNAL_TYPE_STATIC_SCOPE' to every 
g_new_signal (..., GDK_TYPE_EVENT), right ?
It appears to really give a performance gain of about 10%.
Ok to apply a patch ?

> * When opaque-resizing, another ~10 percent of non-debug time
>   was spent maintaining invalid regions. This is probably quite 
>   optimizable - some extra region copies are made, and it looks like 
>   that adding a "completely invalid" flag to GdkWindow might allow bypassing
>   a bunch of computations, since it seems like windows were
>   getting invalidated repeatedly.
>
> * The overhead of double buffering is a rather hard to measure exactly 
>   since it is spread between the client and the server. Things
>   seemed to be 20-30% faster with double-buffering turned off.
>   (But looked a lot worse.)
>
Try: 'testgtk --gtk-unbuffered --bench=all' :-)

I still don't think it 'looked a lot worse' on win32, but this is
probably a problem of the different expose / invalidation handling. 
Will ask some questions in a later mail ...

>   A lot of creation and destruction of graphics contexts and 
>   setting of clip rectangles could be avoided. This probably
>   would cut down the client side overhead to very little. 
>
> * Most of the other time spent looked pretty spread out - though
>   once we tackle the obvious stuff, more bottlenecks may
>   be apparent.
>
>Generally, on my tests on a 400mhz celeron I felt fairly good about
>the overall performance with debugging off. Opaque resizing was a
>little more sluggish than I would like, but other operations seemed
>pretty snappy, and it would definitely have been useable on a slower
>machine.
>
Could you give it a try with an 'appropriate' graphics card (Using
a 32 MB card in a 400 MHz Celeron appears a little unusal to me.
Maybe it's all that snappy on your machine because many of the
pixmaps are cacheable on the graphics card, but what if the
selcetion criteria for the graphics card was the resolution to be
displayed (1024x768x16bit => ca. 2 MB)

>I would like to get the overhead of debugging down to the point where
>we can ship with --enable-debug=minimum as we did for GTK+-1.2, but if
>necessary we can go with --disable-debug for production builds and
>just encourage developers to use --enable-debug versions.
>
IMHO this would take away one of the really nice features of Gtk+:
Catching not crashing on/with 'smaller' programmer's errors.

>If we can cut down the signal overhead some, and do a bit of work at
>the top of the remaining profile, I think we'll do nicely.
>
Agreed, with the above noted exceptions.
 
	Hans

Attachment: gtk-bench.zip
Description: Zip archive

-------- Hans "at" Breuer "dot" Org -----------
Tell me what you need, and I'll tell you how to 
get along without it.                -- Dilbert


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]