Re: Floating point in pango



On Thu, 2006-07-13 at 18:11 -0500, Federico Mena Quintero wrote: 
> On Thu, 2006-07-13 at 20:16 +0100, Richard Purdie wrote:
> 
> > Also of note is that the time spent in no-vmlinux increased
> > significantly (11%). These tests were done against a hardfloat image so
> > floating point instructions cause exceptions and get handled in kernel
> > space. That jump is extremely likely to be the consequence of a
> > significant increase in floating instruction usage.
> 
> But is that really floating point handlers in the kernel, or is it
> something else?


To prove it, I asked Jorn to run the tests again but resolving the
kernel symbols. The results are at:

http://www.o-hand.com/~jorn/pango-benchmarks/28/full-report.txt

These functions are all part of the floating point emulator:

4255      1.7651  vmlinux                  do_fpe
3316      1.3756  vmlinux                  PerformLDF
3120      1.2943  vmlinux                  EmulateAll
2500      1.0371  vmlinux                  EmulateCPRT
2058      0.8537  vmlinux                  addFloat64Sigs
2002      0.8305  vmlinux                  EmulateCPDO
1937      0.8035  vmlinux                  DoubleCPDO
1863      0.7728  vmlinux                  EmulateCPDT
1515      0.6285  vmlinux                  roundAndPackFloat64
1296      0.5376  vmlinux                  checkCondition
1247      0.5173  vmlinux                  float64_mul
1214      0.5036  vmlinux                  PerformSTF
1175      0.4874  vmlinux                  emulate
865       0.3588  vmlinux                  float64_to_int32
839       0.3481  vmlinux                  SetRoundingMode
625       0.2593  vmlinux                  float64_add
620       0.2572  vmlinux                  PerformFIX
515       0.2136  vmlinux                  roundAndPackInt32
451       0.1871  vmlinux                  SetRoundingPrecision
377       0.1564  vmlinux                  float64_is_nan
372       0.1543  vmlinux                  PerformSFM
369       0.1531  vmlinux                  PerformLFM
355       0.1473  vmlinux                  nwfpe_enter
336       0.1394  vmlinux                  ret_from_exception
299       0.1240  vmlinux                  PerformFLT
284       0.1178  vmlinux                  float_raise

At least some of the processor load in the following is also related to
the above as the context switches and userspace data transfer would
trigger them: 

17374     7.2074  vmlinux                  xscale_mc_clear_user_page 
1813      0.7521  vmlinux                  __flush_whole_cache
815       0.3381  vmlinux                  __dabt_usr
701       0.2908  vmlinux                  unmap_vmas
785       0.3256  vmlinux                  mc_copy_user_page
747       0.3099  vmlinux                  free_hot_cold_page
658       0.2730  vmlinux                  update_mmu_cache
631       0.2618  vmlinux                  get_page_from_freelist
535       0.2219  vmlinux                  xscale_flush_user_cache_range
593       0.2460  vmlinux                  cpu_xscale_switch_mm
573       0.2377  vmlinux                  schedule
443       0.1838  vmlinux                  __handle_mm_fault
392       0.1626  vmlinux                  __get_user_4
313       0.1298  vmlinux                  __arch_copy_to_user
293       0.1215  vmlinux                  find_vma

So of the 30% of the time spent in the kernel, a significant fraction is
spent in the floating point code, as I suspected.

> Does oprofile give you stack traces for where each function is called?
> Sysprof gives you that and it is fantastic.

oprofile does give stack traces, both in user and kernel space although
someone has broken the kernel support for it on arm in recent kernels.
I'm looking into fixing it.

> [Somone should port Sysprof to the ARM; it can't be hard.]

Technically, oprofile can do everything sysprof can and a lot more
besides (plus it already works on ARM). oprofile as a profiler is
therefore the better piece of software. sysprof has a pretty GUI though
which seems to attract people more than the capabilities. 

The real task is for someone to write a nice GUI for viewing oprofile
traces. The problem is that sysprof's UI can't handle all the added data
oprofile can provide and oprofile's power is a disadvantage when people
have tried to design a good GUI for it. For reference, oprofile's data
collection and analysis are two totally separate programs and can run on
different machines (of different architectures). 

Richard




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]