Re: [Vala] GObject allocation / referencing... Why is so slow?



On Sat, Jan 15, 2011 at 20:56:45 +0100, Jan Hudec wrote:
On Sat, Jan 15, 2011 at 19:58:37 +0100, Marco Trevisan (Treviño) wrote:
However I don't think that an atomic add or fetch-and-add (or
call them inc and dec-and-test) correctly used in plain C wouldn't cause
all this performance gap.

Nor do I, but I'll try it with the above implementation substituted to your
test code.

Hm, they actually show significant difference here. I slightly modified your
tests (changed test_object_unref from
'TestObject *test_object_unref(TestObject **)' to
'void test_object_unref(TestObject *)) and ran it with simple
increment/decrement and atomic __sync_* functions. And the results are
somewhat surprising. For 100000000 iterations, the

 - single-threaded variant ran in 8.78s
 - thread-safe variant ran in about 11.50s (with much bigger fluctuation
   between runs)

This was quite surprising result, given that I only replaced the reference
count increment and decrement with thread-safe options and that that only
replaces normal increment/decrement with one involving 'lock xaddl'
instruction -- no locks, no extra function calls or anything.

So I retried with unoptimized code. Well, now they run 16 seconds and the
thread-safe variant is at most 0.5s slower and in some runs not slower at
all.

That seems to indicate that the code itself is taking significant portion of
the time (since all versions called the same calloc and free) and that the
lock xaddl itself carries minimal performance penalty compared to regular
add.

So the only difference remaining to explain the large difference in run times
of the optimized code is the inability to optimize across the thread-safe
increment/decrement (the compiler did significant amount of inlining -- the
ref and unref code was copied all over the generated assembly).

Now vala uses functions from glib to do the reference-counting. Since they
are defined in the library, they cannot be inlined and since most
optimizations are not allowed across function call, I expect the performance
penalty to be quite significant.

On a side-note, I also tried C++ using the shared_ptr (C++0x, supported since
gcc 4.4) which ran for 17.0s (optimized). I don't have Dova installed to
compare though.

-- 
                                                 Jan 'Bulb' Hudec <bulb ucw cz>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]