Re: Speeding up thumbnail generation (like multi threaded). Thoughts	please.
- From: muppet <scott asofyet org>
- To: Mark <markg85 gmail com>
- Cc: Gtk+ Developers <gtk-devel-list gnome org>
- Subject: Re: Speeding up thumbnail generation (like multi threaded). Thoughts	please.
- Date: Sun, 30 Aug 2009 23:20:41 -0400
On Aug 29, 2009, at 8:08 AM, Mark wrote:
Oke, i did some ltracing now. Here is the ltrace output for 5 images
that get thumbnailed. I hope anyone here could help me a bit in
explaining this output since it's not all obvious to me.
Before you read the output. i put a 2 second sleep after a image is
done and the pixbuffs are removed from memory. This makes it a lot
more easy to identify where the image loading/scaling/saving is
happening.
The output is here: http://codepad.org/ydtCcJ5d
I find rows of timestamps hard to read for a sense of how long each  
thing took, so i loaded this up in a home-grown trace visualizer i  
created a while back (inspired by Federico's time charts -- this is an  
interactive version).  It's missing the ability to zoom the timeline,  
so the left side is a bit squished, but you can get the idea.  Here's  
a screenshot of the first image's data:
http://asofyet.org/muppet/tmp/thumbnail-lprof-viz.png
You can see that the blue lines with gaps in between represent SYS_read 
() (reading the input file), then another, smaller gap for the scaling  
(doing the work), followed by a chunk of orage for the memcpy()s and  
writes (outputting the thumnail).  As you'd expect, it takes longer to  
read the full file than to write the thumbnail.
Now for the splitting up that i did to sort out io and cpu.
This is all in the 1251553123 timeframe so it's just one image.
1251553123.625556
_ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E 
(0x613de0,
0x701ac0, 16, 15416, 0x7ffffa4b2a90) = 0x613de0
1251553123.625635 _ZNSsC1Ev(0x7ffffa4b2f90, 0x6ed100, 8, 8,
0x5f7974696e616d75) = 0x7fcaa64de158
1251553123.625683
_ZNSt18basic_stringstreamIcSt11char_traitsIcESaIcEE3strERKSs 
(0x7ffffa4b2de0,
0x7ffffa4b2f90, 0x7ffffa4b2f90, 8, 0x5f7974696e616d75) = 0x781578
1251553123.625733 _ZNSsD1Ev(0x7ffffa4b2f90, 0x781578, 0x781578, 0,
0x781778) = 0x7fcaa64de158
1251553123.625781
_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc 
(0x7ffffa4b2df0,
0x40c950, 16, 0, 0x781778) = 0x7ffffa4b2df0
1251553123.625830
_ZStlsIcSt11char_traitsIcESaIcEERSt13basic_ostreamIT_T0_ES7_RKSbIS4_S5_T1_E 
(0x7ffffa4b2df0,
0x701ac0, 88, 88, 0x2f6e6f6974616d69) = 0x7ffffa4b2df0
1251553123.625878
_ZNKSt18basic_stringstreamIcSt11char_traitsIcESaIcEE3strEv 
(0x7ffffa4b2fa0,
0x7ffffa4b2de0, 0x7ffffa4b2de0, 88, 0x5f7974696e616d75) =
0x7ffffa4b2fa0
1251553123.625929 _ZNSsaSERKSs(0x7ffffa4b2f60, 0x7ffffa4b2fa0,
0x7ffffa4b2fa0, 0x7fcaa64de140, 0x69525f7974696e61) = 0x7ffffa4b2f60
1251553123.625977 _ZNSsD1Ev(0x7ffffa4b2fa0, 0, 0x786c50, 0x5cf210,
0x69525f7974696e61) = 0x782018
I have no clue what is going on in those _Z* functions.. what are  
those?
Those are the real symbol names of C++ functions and methods.  The C++  
compiler encodes the return type and parameter types into the real  
symbol names, as described by the C++ ABI.  If you pass the -C or -- 
demangle option to ltrace, he'll make those into something readable.   
Anyway, above you can see stuff like basic_ostream and  
basic_stringstream char_traits, so this is some app code doing C++  
standard library I/O.
1251553123.626120 gdk_pixbuf_new_from_file(0x782018, 0x7ffffa4b2f88,
0x7ffffa4b2f88, 0x4a992f63, 0x69525f7974696e61 <unfinished ...>
1251553123.626161
SYS_open("/home/mark/ThumbnailingBenchmarks/2000_Wallpapers- 
Reanimation/Sea_of_Humanity_Rio_de_Janeiro_Brazil."...,
0, 0666) = 4
1251553123.626216 SYS_fstat(4,  
0x7ffffa4b1b20)                             = 0
1251553123.626238 SYS_mmap(0, 4096, 3, 34, 0xffffffff)
   = 0x7fcaa928a000
1251553123.626273 SYS_read(4, "\377\330\377\340", 4096)
   = 4096
1251553123.626355 SYS_lseek(4, 0,  
0)                                       = 0
1251553123.626379 SYS_rt_sigprocmask(0, 0, 0x7ffffa4a1af0, 8,  
0)           = 0
1251553123.626417 SYS_read(4, "\377\330\377\340", 65536)
   = 65536
1251553123.636544 SYS_read(4,
"Ob\316Z\330\271\003\030\007\004TJ\t-\312N\347f>i 
\001\344\236\347\332\271/\023\370b\373Z 
\324\214\366\332\2040\256\320\241Y7c 
\025\240\276"\201\320\223\003\257\004",
65536) = 65536
1251553123.643681 SYS_read(4,
"\221\301\256\332)u9gwt\212_\331r\\[[\264eS1\202\304\234\222ME 
\007\207\334\027\3350\371rO<Tz\307\332-\322\3365\221\227l8
\022\006k*\326[\247.\032W#\034\214\340\032\350\216\333\034\365%\016v 
\254n\256\213\003\002~\320\243\007\200Mmj7\247G 
\360\235\265\265\276\331U\345b\\"...,
65536) = 65536
1251553123.650134 SYS_read(4,
"\3749w\241\276\254/\2368\334\022eq\202\207\270\375j\343Bo 
\2315\251\234\261t\242\242\357\243\330\304\360\307\207\216\257i` 
\333\205\241\215\216\340\374\345Go 
\326\275 \337\341\235\265\314\022\006\324\345X\336M\376 \031A\\ 
\356\203\033\266\224#\323\360#\212\365\241R 
\314\001\233\216\325\351\220=\325\207\207#i%\020\334L"...,
65536) = 65536
1251553123.656189 SYS_read(4,
"\357\255\254\214w\321\233\231Y\367o\017\234\n\332p \004>7s\323\004R 
\205<",
65536) = 65536
1251553123.662128 SYS_read(4,
"\272\206\250\222i\236'm\303\316{\333wU'\223\217\363\372S 
\224\264\320\342\204/d\374\277\366\323\242\370e\243\330H\251w\250Zl 
\235'\016\222\311\300UA\226\030\374\253WA:w\210<W\255\335\315\022\\K 
\347f\022\343;Tq\376\025\3473k\232\234\227q\013\206o(n\350pM% 
\216\275\253\370f\376\333R"...,
65536) = 65536
1251553123.668090 SYS_read(4, "\016p", 65536)
   = 65536
1251553123.673774 SYS_read(4, "s\00426A\367\377", 65536)
   = 23360
1251553123.673819 SYS_read(4, "",  
40960)                                   = 0
1251553123.676090 SYS_close 
(4)                                             = 0
1251553123.676119 SYS_munmap(0x7fcaa928a000,  
4096)                         = 0
1251553123.676199 <... gdk_pixbuf_new_from_file resumed> )
   = 0x7818f0
Here is a obvious file read going on but what i don't understand is
why are there multiple SYS_read commands? I thought it was just
reading the contents of one file in one read command..
The reads correspond to a loop in the internal function  
_gdk_pixbuf_generic_image_load():
http://git.gnome.org/cgit/gtk+/tree/gdk-pixbuf/gdk-pixbuf-io.c#n905
Each loader module specifies how big a chunk of the file it wants to  
read at a time, and this code reads chunks of that size until the load  
is finished.  This loader says it wants to read the compressed data in  
64K chunks.
1251553123.676269 printf("\n - %f", ... <unfinished ...>
1251553123.676301 SYS_write(1,
"Sea_of_Humanity_Rio_de_Janeiro_Brazil.jpg\n", 42) = 42
1251553123.676364 <... printf resumed> )
So, printf is writing to a file as well? what's happening here?
printf() is defined to write to standard output.  File descriptor 1 is  
standard output.  (Remember, in unix, Everything Is A File.)
Then right after 1251553123.676364 there are 9 more _Z* functions..
what do they do?
More c++ iostreams magic.  Building an output filename?
1251553123.676818 gdk_pixbuf_get_width(0x7818f0, 200, 2, 200,
0x7974696e616d7548) = 1600
1251553123.676866 gdk_pixbuf_get_height(0x7818f0, 200, 0x75a310, 200,
0x7974696e616d7548) = 1200
1251553123.676915 gdk_pixbuf_scale_simple(0x7818f0, 200, 150, 2,
0x7974696e616d7548) = 0x781940
1251553123.682973
Actual image downscaling to a thumbnail is here. obviously cpu stuff
and quite fast.
1251553123.683186 gdk_pixbuf_save(0x781940, 0x782d18, 0x40c9f7,
0x7ffffa4b2f88, 0 <unfinished ...>
....
range of memcopy functions
1251553123.688447 memcpy(0x0079cc02,
"D\274Y\257\255\313u\0356\306\234U\337\267\326n\3169\373\364\315mx 
\311\313{\331\211b\003Q\246%'\024lZ\nd\305\266\002\004Nl
H\203", 8190) = 0x0079cc02
1251553123.688507 SYS_write(4, "\211PNG\r\n\032\n", 4096)
   = 4096
1251553123.688547 SYS_write(4,
"\234\335d\304\332D\262)\213`i\222\212a\273", 4096) = 4096
....
and that roughly 4 more times. I guess a total of ~100 memcpy's
I have a few questions here
- Why are there so many memcopy functions? and could it be possible
that it's a memcopy for every horizontal pixel line (since that would
be roughly equal to the amount of memcpy commands)
- Why is it doing memcpy (~20 times) the writing to a file then memcpy
again then writing again till the image is done.. is that logical? I
might miss the point but i don't see the logic in that. It seems to
write chuncks os data instead of writing all in one go.
The filenames are truncated in the trace output, so i can only guess  
what the output format actually is.  Is it compressed?  What sort of  
compression?  Since it's a thumbnail, and all the memcpy()s are the  
same length (601 bytes), i'll guess it's probably uncompressed, e.g.  
PPM or XV's P7.  The repeated short memcpy()s look like writing out  
scanlines.  The mmap() of a 4K buffer, followed by a series of memcpy() 
s and then a write() follow the pattern of buffered i/o that you'd get  
with
    FILE * fp = fopen (outname, "wb");
    for (i = 0 ; i < scanlines ; i++) {
            next_scanline (scanline, sizeof (scanline), &length);
            fwrite (scanline, length, sizeof (char), fp);
    }
Again, i haven't seen the code, so i'm just saying what the trace  
looks like.
1251553123.706736 SYS_write(4,
"\245\335\375*\300P\024\2067wQI>\250.\036t;"0,\014\006G 
\243\370\231\363\307'O>\363\351\373\357;=\256\\\255\356\bDP\004\203Q 
\230\240\210X"j\001\0042,\233\303\006\001S\024\005W<\256\313r 
\334\311d\377\367\022m\243\235\241\310\251\013_\335*T 
\364\203\365\007\237/\375\366Z&\031\333z"...,
3091) = 3091
1251553123.706790 SYS_close 
(4)                                             = 0
1251553123.707040 SYS_munmap(0x7fcaa928a000,  
4096)                         = 0
1251553123.707090 <... gdk_pixbuf_save  
resumed> )                          = 1
The image is saved to the drive here.
Actually, i'll bet it was that write *and* the several preceding it.   
Each one was only 4K, and the last one was only 3091 bytes.
After that some more _Z* functions and some gdk unref function for
memory that was reserved. not much strange there.
So, if i'm correct (please say so if i'm not) every single *_read*
function is a io call. same with write and close. But how do i get the
numbers from this that you wanted, Cristian?  It is showing the read
-- cpu -- write part (read as in reading the origional image, cpu as
in downscaling it, write as in saving it)
Sorry for asking so much questions about this but it's my first time
with this level of profiling. and i really would like to learn and
just know this kind of stuff.
Thanx for your patience,
Mark.
--
One, two, free, four, five, six, sebben, eight, nine, ten, elebben,  
twull, fourteen, sickteen, sebbenteen, eightteen, elebbenteen,  
fiffeen, elebbenteen!
  -- Zella, aged three, counting to twenty.
[
Date Prev][
Date Next]   [
Thread Prev][
Thread Next]   
[
Thread Index]
[
Date Index]
[
Author Index]