Re: Some initial thoughts about 2.4



 --- Daniel Egger <degger fhm edu> wrote: > Am Mon, 2002-12-30 um 20.02 schrieb Owen
Taylor:
> 
> > The idea of object file reordering is to put infrequently used functions
> > into different pages than frequently used functions. GTK+-2.0 has
> > a lot of code:
> 
> Interesting, this is a new idea to me. However given that a page is
> traditionally 4 or 8k I'm wondering how much functions one can group
> together considering that the whole library has 2.1M (from your
> statistics). Which effects would we want to trigger? Better utilisation
> of I-Cache? Prevent paging? If paging, which sort of paging?
> 

One important thing that it help soptimise is TLB. From real experiments 
in doing this, even a very simplistic and incomplete reordering of often 
called functions in glib, gobject, gtk+ and gdk can easily reduce the 
startup time of applications by ~ 10%. Unfortunatey I don't have a good /
solid methodology or a script that automated it yet, and teh results I got 
aren't too usable as nautilus took a negative hit of 10% for some imporant
operations. Unfortunately I couldn't get the right profile data out of 
nautilus.

> > By reordering the functions in the executable, I think it would
> > be possible to reduce the amount of pages that have to be loaded off
> > the disk for the first app that uses GTK+ by a significant
> > fraction.
> 
> Okay, say the library is mmapped in and the OS is configured to not
> do readahead but instead page in the missing functions in fractions of
> whole pages as we walk through the application, how much gain would you 
> estimate by improving locality? And how would we figure out which
> functions to put together considering that different applications surely
> have a completely different footprint?
> 

I'm slightly sceptical about there being a benefit from grouping functions
based on whetever they are used or not , even if you made sure that the normal
ordering of functions (so you don't destroy the locality of reference that is
presently present due to the distribution of functions to source files) is 
preserved.

As for the ordering - its relatively safe bet that you would want to do it 
based on call graph information, whetever static call graph or augumented by 
say call count information. 

> > (though prelinking did make a 10-20% difference for gnome-terminal
> > in some timings i did)
> 
> This is interesting. Dynamiclinking in the C case is bog simple and
> really hard to speed up. How did you achieve that?
> 

taking care of symbol counts & making sure loading of library dependencies 
that aren't used don't get loaded / resolved for symbols is quite important.
Its all data the dnamic linker does not have to touch or touches in different
order (with say lazyloading). I'm not sure how prelinking and direct binding (which
solaris linker has) differ but probably not much - its definately beneficial,
and a lot of the benefit comes from not having to search for symbols.

A similar benefit can be had from sanitizing the link lines and only linking 
against direct needed dependancies and letting library dep. loading do the rest.

> Do you have any pointers or papers? This looks like an interesting
> area for some research.
> 

look in citeseer.org - the technical term for it is code colocation. 
But several aspects of it have only been looked at for HPC it seems,
and even then only for static programs and not shlibs. And we probably 
aren't at the stage where we need to worry about cachline colouring 
to make better use of i-cache yet.

> -- 
> Servus,
>        Daniel
> 


__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]