[g-a-devel]performance tracing ...



Hi guys,

	So - I got a tad annoyed with the sluggishness of my (fast) system
while running all the speech, magnifier and other good stuff
concurrently. So I generated this trace of gnopernicus with
ORBIT2_DEBUG=traces:timings.

	The timings are ~the difference between the timestamp at start and end
of the method [ not so clear - it'd be nicer to have a delta I guess,
patches welcome ].

	Anyway; it shows some interesting sillies / areas for effective
optimisation I thought people might like to know:

	http://primates.ximian.com/~michael/a11y-log.txt

	There are some amusing sections:

	First of all we start with what looks like a horrible mis-use of gconf,
there is something really nasty in the structure of what is being done
there; it doesn't look like a good fit for how to use gconf; ask on
gconf-list how you should be storing this data - the answer is prolly an
XML file in ~. You can hide some of the mess and improve performance
though by doing a: 

	gconf_client_add_dir (client, "/foo",
			      GCONF_CLIENT_PRELOAD_RECURSIVE, NULL);

	Which will pre-load and cache the whole thing in one roundtrip.

	For those that missed this basic lesson in high-school; roundtrips are
expensive - everything you do must be to try and avoid them; this is
particularly crucial in CORBA interface design as we'll see.

	So after ~500 lines of gconfd roundtrip calls we get to:

p15839 1039708403.752603 : ([0x8081808])->ref () 1039708403.752887 
p15839 1039708403.752910 : ([0x8081808])->unref () 1039708403.753186 

	we do a load of redundant roundtrips doing the 'ping' of the registry;
which is mostly a waste of time; we can tell if the connection is dead
on the object in-proc instantly with

	ORBitConnectionStatus
	ORBit_small_get_connection_status (CORBA_Object obj);
	(ORBIT_CONNECTION_DISCONNECTED)

	but that's a small point; I saw ~110 of these in the trace.

	We then get onto mouse movement - where I see great slews of:

: ([0x80aa8d8])->notifyEvent ({ 'mouse:abs', [0x80cf9d8], 0x3c5, 0x257,
{ tc:tk_null, null } })
: ([0x80cf9d8])->getRole () =>: 67
: ([0x80cf9d8])->queryInterface ('IDL:Accessibility/Text:1.0') =>: [nil]


: ([0x80aa8d8])->notifyEvent ({ 'mouse:rel', [0x80cf9d8], 0xfffffffe,
0x0, { tc:tk_null, null } })
: ([0x80cf9d8])->getRole () =>: 67
: ([0x80cf9d8])->queryInterface ('IDL:Accessibility/Text:1.0') =>: [nil]
	repeated over and over. Now of course, we can't do much about the event
notification, but we can certainly cache the queryInterface result - an
object is contractually not allowed to aggregate new interfaces when it
has been exposed. We can also cache the 'role' I imagine [ I assume we
don't have some silly role-changed signal emission stuff, and that if
you want an object to change role you replace it in the hierarchy ].

	So - with those two bits cached, we bin ~all of our roundtrip traffic
inside the event handler. It would makes things like:

p15839 1039708450.779810 :
([0x80d1fb8])->getRole () =>: 23 
([0x80d1fb8])->getRole () =>: 23
([0x80d1fb8])->getRole () =>: 23
([0x80d1fb8])->getRole () =>: 23 
([0x80d1fb8])->getRole () =>: 23 1039708450.782008 

	take 10ns instead of 10ms ;-)

	This:

p15839 1039708453.463617 : ([0x80cf620])->queryInterface
('IDL:Accessibility/Table:1.0') =>: [0x80d08b8] 1039708453.464088 
p15839 1039708453.464121 : ([0x80d08b8])->unref () 1039708453.464424 
p15839 1039708453.464496 : ([0x80cf620])->queryInterface
('IDL:Accessibility/Table:1.0') =>: [0x80d1f90] 1039708453.464911 

	Seems to suggest that we need to get people to avoid using the 'isAFoo'
method on things, and instead do a getFoo - esp. when they're going to
use the result right next to it ;-) the pattern comes up again and
again:

p15839 1039708453.665682 : ([0x80cf730])->queryInterface
('IDL:Accessibility/Action:1.0') =>: [0x80d1d28] 1039708453.666279 
p15839 1039708453.666314 : ([0x80d1d28])->unref () 1039708453.666604 
p15839 1039708453.666674 : ([0x80cf730])->queryInterface
('IDL:Accessibility/Action:1.0') =>: [0x80d1cf0] 1039708453.667205 

	Making one wonder about the usefulness / wisdom of the 'isAFoo'
methods.

	Then of course - there is the tragedy of the StateSet interface:

p15839 1039708453.467308 : ([0x80cf730])->getState () =>: [0x80d1cf0]
1039708453.468123 
p15839 1039708453.468196 : ([0x80d1cf0])->contains (1) =>: 0
1039708453.468581 
p15839 1039708453.468607 : ([0x80d1cf0])->contains (4) =>: 0
1039708453.468919 
p15839 1039708453.468944 : ([0x80d1cf0])->contains (0) =>: 0
1039708453.469252 
p15839 1039708453.469277 : ([0x80d1cf0])->contains (7) =>: 0
1039708453.469583 
p15839 1039708453.469607 : ([0x80d1cf0])->contains (9) =>: 0
1039708453.469915 
p15839 1039708453.469939 : ([0x80d1cf0])->contains (10) =>: 0
1039708453.470247 
p15839 1039708453.470270 : ([0x80d1cf0])->contains (11) =>: 0
1039708453.470578 
p15839 1039708453.470602 : ([0x80d1cf0])->contains (12) =>: 0
1039708453.470909 
p15839 1039708453.470933 : ([0x80d1cf0])->contains (16) =>: 0
1039708453.471240 
p15839 1039708453.471263 : ([0x80d1cf0])->contains (20) =>: 0
1039708453.471570 
p15839 1039708453.471594 : ([0x80d1cf0])->contains (23) =>: 1
1039708453.471876 
p15839 1039708453.471899 : ([0x80d1cf0])->contains (30) =>: 1
1039708453.663837 
p15839 1039708453.663876 : ([0x80d1cf0])->contains (25) =>: 1
1039708453.664245 
p15839 1039708453.664270 : ([0x80d1cf0])->contains (15) =>: 0
1039708453.664580 
p15839 1039708453.664604 : ([0x80d1cf0])->contains (8) =>: 1
1039708453.664912 
p15839 1039708453.664972 : ([0x80d1cf0])->unref () 1039708453.665263 

	Again and again; 540 calls to 'contains' each to the next with a race
condition; and that's before you consider the hell that would break
loose if someone actually used an interface like: 

	AccessibleStateSet *
	AccessibleStateSet_compare (AccessibleStateSet *obj,
	                            AccessibleStateSet *obj2);

	with statesets from two different applications.

	This interface needs junking completely - there is hope for that of
course; we could make the 'getStateSet' API just do a 'getStates' and
then implement all the methods as children of a purely local state set.
[ anyone polling on a state-set to wait for a change needs shooting
anyway ]. So with that we could save another scad of calls [ and notably
200ms for the above call set (a pretty bad one but ...) ].

	I think if we do that - we'll get an extremely snappy system which'll
be a pleasure to use :-) [ at least on a fast machine ].

	Is there any controversy there ? and/or problems ? I think a small
amount of work on caching would do wonders for performance, although
changing the way the StateSet works will provide by far the biggest win
- I think that's the no. 1 performance killer.

	HTH,

		Michael.

-- 
 mmeeks gnu org  <><, Pseudo Engineer, itinerant idiot




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]