Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.
- From: Christian Hergert <chris dronelabs com>
- To: Mark <markg85 gmail com>
- Cc: gtk-devel-list gnome org, David Zeuthen <david fubar dk>
- Subject: Re: Speeding up thumbnail generation (like multi threaded). Thoughts please.
- Date: Fri, 28 Aug 2009 16:04:39 -0700
On Fri, Aug 28, 2009 at 11:49 PM, Christian Hergert<chris dronelabs com> wrote:
Hi,
What you mentioned is good information to start hunting. Was the CPU time
related to IO wait at all? Always get accurate numbers before performance
tuning. "Measure, measure, measure" or so the mantra goes.
Perhaps a stupid question but what is a good way of profiling io? cpu
is easy but i've never done io.
In this case my hdd is certainly able to do more then 10 thumbnails
per second however i could see a potential issue when someone with a
slower hdd and a faster cpu then mine is thumbnailing a lot of images.
There the hdd will likely be the bottleneck.
You can do something really crude by reading from /proc/pid/* (man proc
for more info). Or you could try using some tools like sysstat,
oprofile, system-tap, etc. We really need a generic profiling tool that
can do all of this stuff from a single interface. However, at the
current time, I've been most successful with just writing one off
graphing for the specific problem. For example, put in some g_print()
lines and grep for those and then graph them using your favorite plotter
or cairo goodness.
Unfortunately, the symptom you see regarding IO will very likely change
under a different processing model. If the problem is truly CPU bound then
you will only be starting IO requests after you were done processing. This
means valuable time is wasted while waiting for the pages to be loaded into
the buffers. The code will just be blocking while this is going on.
And how can i test that?
ltrace works for simple non-threaded applications. Basically you should
see in the profiling timings that one work item happens sequentially
after the previous such as (load, process, load, process, ...)
I would hate to provide conjecture about the proper design until we have
more measurements. It is a good idea to optimize the single threaded
approach before the multi-core approach since it would have to be done
anyway and is likely less complex of a problem before the additional
threads.
What could be done easily is every time an item starts processing it could
asynchronously begin loading the next image using gio. This means the
kernel can start paging that file into the vfs cache while you are
processing the image. This of course would still mean you are limited to a
single processor doing the scaling. But if the problem is in fact cpu
bound, that next image will almost always be loaded by time you finish the
scale meaning you've maximized the processing potential per core.
That sounds like a nice way to optimize it for one core. But could
there be any optimization possible in my case? since i have 100% cpu
usage for one core with just the benchmark.
You can't properly optimize for the multi-core scenario until the
single-core scenario is fixed.
To support multi-core, like it sounds like you want, a queue could be used
to store the upcoming work items. A worker per core, for example, can get
their next file from that queue. FWIW, I wrote a library, iris[1], built
specifically for doing work like this while efficiently using threads with
minimum lock-contention. It would allow for scaling up threads to the
number of cores and back down when they are no longer needed.
That sounds very interesting.
Just one question about the queue. Would it be better to thread the
application (nautilus) or the library (glib)? If your answer is the
library then the queue has to be passed from nautilus to glib. I would
say glib because all application have benefit from it without
adjusting there code.
I haven't looked at this code in detail yet, so I cannot confirm or
deny. My initial assumption would be that the thumb-nailing API (again,
I have no experience with it yet) should be restructured around an
asynchronous design (begin/end methods) and the synchronous
implementation built around that. And of course, nobody should use the
synchronous version unless they *really* have a reason to.
FWIW, I would be willing to help hack on this, but I'm swamped for at
least the next few weeks.
-- Christian
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]