Re: Floating point in pango
- From: Richard Purdie <richard openedhand com>
- To: Federico Mena Quintero <federico ximian com>
- Cc: performance-list gnome org, Behdad Esfahbod <behdad behdad org>
- Subject: Re: Floating point in pango
- Date: Fri, 14 Jul 2006 12:53:49 +0100
On Thu, 2006-07-13 at 18:11 -0500, Federico Mena Quintero wrote:
> On Thu, 2006-07-13 at 20:16 +0100, Richard Purdie wrote:
>
> > Also of note is that the time spent in no-vmlinux increased
> > significantly (11%). These tests were done against a hardfloat image so
> > floating point instructions cause exceptions and get handled in kernel
> > space. That jump is extremely likely to be the consequence of a
> > significant increase in floating instruction usage.
>
> But is that really floating point handlers in the kernel, or is it
> something else?
To prove it, I asked Jorn to run the tests again but resolving the
kernel symbols. The results are at:
http://www.o-hand.com/~jorn/pango-benchmarks/28/full-report.txt
These functions are all part of the floating point emulator:
4255 1.7651 vmlinux do_fpe
3316 1.3756 vmlinux PerformLDF
3120 1.2943 vmlinux EmulateAll
2500 1.0371 vmlinux EmulateCPRT
2058 0.8537 vmlinux addFloat64Sigs
2002 0.8305 vmlinux EmulateCPDO
1937 0.8035 vmlinux DoubleCPDO
1863 0.7728 vmlinux EmulateCPDT
1515 0.6285 vmlinux roundAndPackFloat64
1296 0.5376 vmlinux checkCondition
1247 0.5173 vmlinux float64_mul
1214 0.5036 vmlinux PerformSTF
1175 0.4874 vmlinux emulate
865 0.3588 vmlinux float64_to_int32
839 0.3481 vmlinux SetRoundingMode
625 0.2593 vmlinux float64_add
620 0.2572 vmlinux PerformFIX
515 0.2136 vmlinux roundAndPackInt32
451 0.1871 vmlinux SetRoundingPrecision
377 0.1564 vmlinux float64_is_nan
372 0.1543 vmlinux PerformSFM
369 0.1531 vmlinux PerformLFM
355 0.1473 vmlinux nwfpe_enter
336 0.1394 vmlinux ret_from_exception
299 0.1240 vmlinux PerformFLT
284 0.1178 vmlinux float_raise
At least some of the processor load in the following is also related to
the above as the context switches and userspace data transfer would
trigger them:
17374 7.2074 vmlinux xscale_mc_clear_user_page
1813 0.7521 vmlinux __flush_whole_cache
815 0.3381 vmlinux __dabt_usr
701 0.2908 vmlinux unmap_vmas
785 0.3256 vmlinux mc_copy_user_page
747 0.3099 vmlinux free_hot_cold_page
658 0.2730 vmlinux update_mmu_cache
631 0.2618 vmlinux get_page_from_freelist
535 0.2219 vmlinux xscale_flush_user_cache_range
593 0.2460 vmlinux cpu_xscale_switch_mm
573 0.2377 vmlinux schedule
443 0.1838 vmlinux __handle_mm_fault
392 0.1626 vmlinux __get_user_4
313 0.1298 vmlinux __arch_copy_to_user
293 0.1215 vmlinux find_vma
So of the 30% of the time spent in the kernel, a significant fraction is
spent in the floating point code, as I suspected.
> Does oprofile give you stack traces for where each function is called?
> Sysprof gives you that and it is fantastic.
oprofile does give stack traces, both in user and kernel space although
someone has broken the kernel support for it on arm in recent kernels.
I'm looking into fixing it.
> [Somone should port Sysprof to the ARM; it can't be hard.]
Technically, oprofile can do everything sysprof can and a lot more
besides (plus it already works on ARM). oprofile as a profiler is
therefore the better piece of software. sysprof has a pretty GUI though
which seems to attract people more than the capabilities.
The real task is for someone to write a nice GUI for viewing oprofile
traces. The problem is that sysprof's UI can't handle all the added data
oprofile can provide and oprofile's power is a disadvantage when people
have tried to design a good GUI for it. For reference, oprofile's data
collection and analysis are two totally separate programs and can run on
different machines (of different architectures).
Richard
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]