Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.



On Tue, 2009-09-29 at 22:59 +0200, Mark wrote:

> hehe that was the idea indeed ^_^ and i will continue with that.
> I will test the large factors tomorrow.
> 
> For now i'm happy with 100% cpu usage on all my cores (4).
> with the code posted in my previous message i only had 70% cpu usage
> so there was a bottleneck and it wasn't the HDD nor the CPU.
> Now that's fixed with giving each thread more then one (5 actually)
> images of it's own before locking and refilling the queue of 5 so now
> there is 100% cpu usage in the multi threaded benchmark.
> 
> http://codepad.org/PKnp69qW

Ok, i tested this a bit, and my results are not the same as yours.

I tested on a directory with 1348 jpeg files, each aroung 5 megapixels,
totalling 3.1 gig of data.
Before each test I ran (as root):
 sync; echo 3 > /proc/sys/vm/drop_caches
This flushes the caches, for two reasons: make the tests comparable
(i.e. same cache status), and to make the test realistic (nobody
thumbnails 3 gig of files that are all in the cache).

You  test is scaling to size 200, which is not the thumbnail size (128),
but lets ignore that for now.

// GLib Thumbnailing Benchmark

There is a bug in the benchmark, where it saves the original pixbuf
rather than the thumbnailed one, making this very very slow. When I
fixed this i get this timing:

real    3m40.876s
user    3m19.667s
sys     0m2.542s

Same test but, using gnome_desktop_thumbnail_scale_down_pixbuf():

real    3m34.784s
user    3m13.926s
sys     0m2.479s

So, for me gnome_desktop_thumbnail_scale_down_pixbuf() is ~3% faster
(which makes some sense, as its using a simpler algorithm). Did you
compile your benchmark app with full optimization? (since you have an
in-line copy of the scale_down_pixbuf function this is required)

(The rest of the tests are all run with gdk_pixbuf_scale_simple for easy
comparison.

// Glib more rapid thumbnailing benchmark

real    1m56.650s
user    1m24.030s
sys     0m2.622s

Here we can see that the jpeg loading trick really helps us.

//Glib threaded thumbnailing

My machine has 2 cores, not 4 as yours.

With the default 4 threads:

real    2m2.194s
user    1m25.437s
sys     0m2.982s

Changed to use two threads:

real    1m53.783s
user    1m25.948s
sys     0m2.966s

If we use the same number of threads as cpus we go slightly faster
(approximately 2.6% less time). However, if we use more things are
actually slower.

I've got 4 gigs of memory, so not everything will fit in the cache, but
the caches would probably help a bit, to verify this i ran the same
two-thread example without blowing the caches first:

real    1m36.681s
user    1m21.610s
sys     0m2.501s

So, slighly better, and we can see that the real time is getting nearer
to the user time, which means that less time were spent waiting on disk.
However, i'm not sure how interesting a cached benchmark is. Nobody will
thumbnail the same files twice.

Now, what does this mean for Nautilus....

Well, nautilus loads the files using gdk-pixbuf io-based resizing, which
is essentially what "Glib more rapid" does. I.E. it uses the jpeg
loading trick and scales using gdk_pixbuf_scale_simple. It calls
gnome_desktop_thumbnail_scale_down_pixbuf() only when an external
thumbnailer returns an oversize result (i.e. very seldom).

Given the above result this is not ideal. The ideal would be to use the
jpeg loader trick but then downscale with
gnome_desktop_thumbnail_scale_down_pixbuf(), although that is hard to
implement given the pixbuf APIs.

Nautilus uses only one thread for thumbnailing, and upping this to the
number of cpus of the machine could gain us a slight advantage, at the
risk of starving the rest of nautilus by the increase in i/o traffic.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]