[g-a-devel]performance tracing ...
- From: Michael Meeks <michael ximian com>
- To: Bill Haneman <bill haneman sun com>, Draghi Puterity <mp baum de>
- Cc: accessibility mailing list <gnome-accessibility-devel gnome org>
- Subject: [g-a-devel]performance tracing ...
- Date: 13 Dec 2002 12:15:23 +0000
Hi guys,
So - I got a tad annoyed with the sluggishness of my (fast) system
while running all the speech, magnifier and other good stuff
concurrently. So I generated this trace of gnopernicus with
ORBIT2_DEBUG=traces:timings.
The timings are ~the difference between the timestamp at start and end
of the method [ not so clear - it'd be nicer to have a delta I guess,
patches welcome ].
Anyway; it shows some interesting sillies / areas for effective
optimisation I thought people might like to know:
http://primates.ximian.com/~michael/a11y-log.txt
There are some amusing sections:
First of all we start with what looks like a horrible mis-use of gconf,
there is something really nasty in the structure of what is being done
there; it doesn't look like a good fit for how to use gconf; ask on
gconf-list how you should be storing this data - the answer is prolly an
XML file in ~. You can hide some of the mess and improve performance
though by doing a:
gconf_client_add_dir (client, "/foo",
GCONF_CLIENT_PRELOAD_RECURSIVE, NULL);
Which will pre-load and cache the whole thing in one roundtrip.
For those that missed this basic lesson in high-school; roundtrips are
expensive - everything you do must be to try and avoid them; this is
particularly crucial in CORBA interface design as we'll see.
So after ~500 lines of gconfd roundtrip calls we get to:
p15839 1039708403.752603 : ([0x8081808])->ref () 1039708403.752887
p15839 1039708403.752910 : ([0x8081808])->unref () 1039708403.753186
we do a load of redundant roundtrips doing the 'ping' of the registry;
which is mostly a waste of time; we can tell if the connection is dead
on the object in-proc instantly with
ORBitConnectionStatus
ORBit_small_get_connection_status (CORBA_Object obj);
(ORBIT_CONNECTION_DISCONNECTED)
but that's a small point; I saw ~110 of these in the trace.
We then get onto mouse movement - where I see great slews of:
: ([0x80aa8d8])->notifyEvent ({ 'mouse:abs', [0x80cf9d8], 0x3c5, 0x257,
{ tc:tk_null, null } })
: ([0x80cf9d8])->getRole () =>: 67
: ([0x80cf9d8])->queryInterface ('IDL:Accessibility/Text:1.0') =>: [nil]
: ([0x80aa8d8])->notifyEvent ({ 'mouse:rel', [0x80cf9d8], 0xfffffffe,
0x0, { tc:tk_null, null } })
: ([0x80cf9d8])->getRole () =>: 67
: ([0x80cf9d8])->queryInterface ('IDL:Accessibility/Text:1.0') =>: [nil]
repeated over and over. Now of course, we can't do much about the event
notification, but we can certainly cache the queryInterface result - an
object is contractually not allowed to aggregate new interfaces when it
has been exposed. We can also cache the 'role' I imagine [ I assume we
don't have some silly role-changed signal emission stuff, and that if
you want an object to change role you replace it in the hierarchy ].
So - with those two bits cached, we bin ~all of our roundtrip traffic
inside the event handler. It would makes things like:
p15839 1039708450.779810 :
([0x80d1fb8])->getRole () =>: 23
([0x80d1fb8])->getRole () =>: 23
([0x80d1fb8])->getRole () =>: 23
([0x80d1fb8])->getRole () =>: 23
([0x80d1fb8])->getRole () =>: 23 1039708450.782008
take 10ns instead of 10ms ;-)
This:
p15839 1039708453.463617 : ([0x80cf620])->queryInterface
('IDL:Accessibility/Table:1.0') =>: [0x80d08b8] 1039708453.464088
p15839 1039708453.464121 : ([0x80d08b8])->unref () 1039708453.464424
p15839 1039708453.464496 : ([0x80cf620])->queryInterface
('IDL:Accessibility/Table:1.0') =>: [0x80d1f90] 1039708453.464911
Seems to suggest that we need to get people to avoid using the 'isAFoo'
method on things, and instead do a getFoo - esp. when they're going to
use the result right next to it ;-) the pattern comes up again and
again:
p15839 1039708453.665682 : ([0x80cf730])->queryInterface
('IDL:Accessibility/Action:1.0') =>: [0x80d1d28] 1039708453.666279
p15839 1039708453.666314 : ([0x80d1d28])->unref () 1039708453.666604
p15839 1039708453.666674 : ([0x80cf730])->queryInterface
('IDL:Accessibility/Action:1.0') =>: [0x80d1cf0] 1039708453.667205
Making one wonder about the usefulness / wisdom of the 'isAFoo'
methods.
Then of course - there is the tragedy of the StateSet interface:
p15839 1039708453.467308 : ([0x80cf730])->getState () =>: [0x80d1cf0]
1039708453.468123
p15839 1039708453.468196 : ([0x80d1cf0])->contains (1) =>: 0
1039708453.468581
p15839 1039708453.468607 : ([0x80d1cf0])->contains (4) =>: 0
1039708453.468919
p15839 1039708453.468944 : ([0x80d1cf0])->contains (0) =>: 0
1039708453.469252
p15839 1039708453.469277 : ([0x80d1cf0])->contains (7) =>: 0
1039708453.469583
p15839 1039708453.469607 : ([0x80d1cf0])->contains (9) =>: 0
1039708453.469915
p15839 1039708453.469939 : ([0x80d1cf0])->contains (10) =>: 0
1039708453.470247
p15839 1039708453.470270 : ([0x80d1cf0])->contains (11) =>: 0
1039708453.470578
p15839 1039708453.470602 : ([0x80d1cf0])->contains (12) =>: 0
1039708453.470909
p15839 1039708453.470933 : ([0x80d1cf0])->contains (16) =>: 0
1039708453.471240
p15839 1039708453.471263 : ([0x80d1cf0])->contains (20) =>: 0
1039708453.471570
p15839 1039708453.471594 : ([0x80d1cf0])->contains (23) =>: 1
1039708453.471876
p15839 1039708453.471899 : ([0x80d1cf0])->contains (30) =>: 1
1039708453.663837
p15839 1039708453.663876 : ([0x80d1cf0])->contains (25) =>: 1
1039708453.664245
p15839 1039708453.664270 : ([0x80d1cf0])->contains (15) =>: 0
1039708453.664580
p15839 1039708453.664604 : ([0x80d1cf0])->contains (8) =>: 1
1039708453.664912
p15839 1039708453.664972 : ([0x80d1cf0])->unref () 1039708453.665263
Again and again; 540 calls to 'contains' each to the next with a race
condition; and that's before you consider the hell that would break
loose if someone actually used an interface like:
AccessibleStateSet *
AccessibleStateSet_compare (AccessibleStateSet *obj,
AccessibleStateSet *obj2);
with statesets from two different applications.
This interface needs junking completely - there is hope for that of
course; we could make the 'getStateSet' API just do a 'getStates' and
then implement all the methods as children of a purely local state set.
[ anyone polling on a state-set to wait for a change needs shooting
anyway ]. So with that we could save another scad of calls [ and notably
200ms for the above call set (a pretty bad one but ...) ].
I think if we do that - we'll get an extremely snappy system which'll
be a pleasure to use :-) [ at least on a fast machine ].
Is there any controversy there ? and/or problems ? I think a small
amount of work on caching would do wonders for performance, although
changing the way the StateSet works will provide by far the biggest win
- I think that's the no. 1 performance killer.
HTH,
Michael.
--
mmeeks gnu org <><, Pseudo Engineer, itinerant idiot
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]