Am Die, 2002-12-31 um 01.39 schrieb Owen Taylor: > There is quite a bit of precedent for it ... Microsoft has done > it for a long time. SGI had a utility called 'cord' that did > this. Nat Friedman did some experimentation under the 'grope' name > some years ago, which showed promise (something like halving the > load time for GCC) though nothing that useful came out of it. And this is really function reordering and not prelinking like Apple is also doing it? > That 2.1M is 7000+ functions, so we are talking something on > the order of 15 functions on a 4k page. Sure, but then again we have larger and smaller functions and they would have to fit completely into 1 page. And as you implicitely stated yourself: Only some special groupings make sense. > The optimization part is the hard part, certainly. > I-Cache is relatively difficult to instrument. Not anymore; performance counters on several architectures and several tools provide quite a good picture about cache performance. When not instrumenting a whole program (though it certainly makes more sense to get the complete picture) but some isolated sequence like a common hotspot one can figure out quite well where optimisation might be applied sensibly. > > Okay, say the library is mmapped in and the OS is configured to not > > do readahead but instead page in the missing functions in fractions of > > whole pages as we walk through the application, how much gain would you > > estimate by improving locality? > An OS that doesn't read in whole pages is really a bit too far > from my experience to make any guesses at. Sorry, my bad. The fractions referred to parts of the whole library, of course pages are the granules on any major OS. Unfortunately it hard to tell how much of the library will be paged in at once because many systems page in much more than just 4k. > Apps are different, but not *that* different. That is, every > app uses gtk_widget_show_all(), nothing will use > gtk_progress_bar_set_discrete_blocks(). Heck, I'm an outsider.... :) > libgtk has some 2000 relocations in it that have to be processed > at startup. And remember that any page with a relocation has to > be copied and can't be shared between apps. So what you really want to do is avoid relocations? > I thought there was a whitepaper on Jakub Jelinek's prelinking stuff, > but I don't see it in a quick search. There may useful docs in > the prelink tarball: Prelinking is pretty heavy used nowadays. Andreas Jäger also has quite some experience in this area and helped to dramatically speed up KDE. Though C++ is a completely different matter because of the higher level of indirection. > For object file reordering, I don't have reference off-hand but > it shouldn't be that hard to dig something up. Hard to find actually, I googled for it but nothing great showed up. BTW: The only chance I see to realize that comfortably is on GNU platforms with an ld recipe which seems quite a lot of work for a guesstimated 10% startup improvement; would that really cut an edge? -- Servus, Daniel
Attachment:
signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil