Re: Some initial thoughts about 2.4



Am Die, 2002-12-31 um 01.39 schrieb Owen Taylor:

> There is quite a bit of precedent for it ... Microsoft has done
> it for a long time. SGI had a utility called 'cord' that did
> this. Nat Friedman did some experimentation under the 'grope' name
> some years ago, which showed promise (something like halving the
> load time for GCC) though nothing that useful came out of it.

And this is really function reordering and not prelinking like Apple
is also doing it?

> That 2.1M is 7000+ functions, so we are talking something on
> the order of 15 functions on a 4k page.

Sure, but then again we have larger and smaller functions and they
would have to fit completely into 1 page. And as you implicitely 
stated yourself: Only some special groupings make sense.

> The optimization part is the hard part, certainly.
> I-Cache is relatively difficult to instrument. 

Not anymore; performance counters on several architectures and
several tools provide quite a good picture about cache performance.
When not instrumenting a whole program (though it certainly makes
more sense to get the complete picture) but some isolated sequence
like a common hotspot one can figure out quite well where optimisation
might be applied sensibly.

> > Okay, say the library is mmapped in and the OS is configured to not
> > do readahead but instead page in the missing functions in fractions of
> > whole pages as we walk through the application, how much gain would you 
> > estimate by improving locality? 

> An OS that doesn't read in whole pages is really a bit too far
> from my experience to make any guesses at.

Sorry, my bad. The fractions referred to parts of the whole library, of
course pages are the granules on any major OS. Unfortunately it hard
to tell how much of the library will be paged in at once because many
systems page in much more than just 4k.

> Apps are different, but not *that* different. That is, every
> app uses gtk_widget_show_all(), nothing will use
> gtk_progress_bar_set_discrete_blocks().

Heck, I'm an outsider.... :)

> libgtk has some 2000 relocations in it that have to be processed
> at startup. And remember that any page with a relocation has to 
> be copied and can't be shared between apps.

So what you really want to do is avoid relocations?
 
> I thought there was a whitepaper on Jakub Jelinek's prelinking stuff,
> but I don't see it in a quick search. There may useful docs in
> the prelink tarball:

Prelinking is pretty heavy used nowadays. Andreas Jäger also has quite
some experience in this area and helped to dramatically speed up KDE.
Though C++ is a completely different matter because of the higher level
of indirection.

> For object file reordering, I don't have reference off-hand but
> it shouldn't be that hard to dig something up.

Hard to find actually, I googled for it but nothing great showed up.
BTW: The only chance I see to realize that comfortably is on GNU
platforms with an ld recipe which seems quite a lot of work for
a guesstimated 10% startup improvement; would that really cut an edge? 

-- 
Servus,
       Daniel

Attachment: signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]