Re: high-level text drawing



Hi,

The slowness may be explained by the following points:
- pango xft uses antialiasing => drawing a string need the knowledge of the 
underlying pixel => overhead. 
- you use an exported X window (non local Xserver) => the data exchange due 
to the previous point is slow due to the network link (on a local machine, I did 
not see any difference in my tests due to antialiasing). Exceed may not support XRENDER extension.

I did some profiling work, and my conclusion is: unicode string analysis and 
the related memory allocation increase is the primary cause of slowness.
Drawing is, at least on a local server, not the problem.

- pango text is UTF8 with potential right-to-left and left-to-right text, so text 
analysis and glyph lookup is inherently more complex than in xdrawstring.

This seems to be confirmed by the profiling.

- font rasterisation: 

During the test, I used the default fontconfig font "Sans", which is an OpenType
font on my box. The X core font was a bitmapped adobe-courier font. But it 
looks like rasterisation is not the bottleneck, on my system.

I used oprofile to profile my system. First, I profiled it under no load, for 
different duration (10, 20, 30 and 40s) to see how the kernel, libc, X behave.
Then I tested the XDrawString, and finally the gtk2 drawing.

I used a modified test program, that allows to select the test (x or gtk2) at 
runtime.

System:
XFree86 4.3.99, ATI Radeon All-In-Wonder 32Mb
Linux 2.6.0-mm1
glibc 2.3.3 (cvs) + NPTL
PII-266
RAM 128Mb
Window manager: WindowMaker
gtk2 suite from CVS (november)

System noise (or system profile under no specific load):

Profiling the system without any interactive or batch process to monitor
kernel, libc and X noise.

The system is profiled for 10 seconds:

date; opcontrol --start; sleep 10 ;opcontrol --shutdown; date;

opreport --long-filenames

sleep 10:

CPU: PII, speed 267.28 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask) 
count 100000
CPU_CLK_UNHALT...|
  samples|      %|
------------------
      953 64.4790 /usr/src/linux/vmlinux
      340 23.0041 /bin/bash
       76  5.1421 /lib/ld-2.3.3.so
       46  3.1123 /lib/libc-2.3.3.so
       21  1.4208 /usr/X11R6/bin/XFree86
       11  0.7442 /usr/X11R6/lib/libX11.so.6.2
        7  0.4736 /usr/bin/grep
        7  0.4736 /usr/bin/oprofiled
        4  0.2706 /usr/local/bin/wmaker
        4  0.2706 /oprofile
        4  0.2706 /usr/bin/op_help
        2  0.1353 /usr/local/bin/wterm
        1  0.0677 /usr/X11R6/lib/libXt.so.6.0
        1  0.0677 /usr/local/lib/libgcc_s.so.1
        1  0.0677 /usr/bin/gawk

sleep 20:

CPU_CLK_UNHALT...|
  samples|      %|
------------------
     1555 73.1077 /usr/src/linux/vmlinux
      338 15.8909 /bin/bash
       71  3.3380 /lib/ld-2.3.3.so
       57  2.6798 /lib/libc-2.3.3.so
       42  1.9746 /usr/X11R6/bin/XFree86
       14  0.6582 /usr/X11R6/lib/libX11.so.6.2
       12  0.5642 /usr/local/bin/wmaker
        9  0.4231 /usr/bin/oprofiled
        7  0.3291 /oprofile
        5  0.2351 /usr/bin/op_help
        4  0.1881 /usr/local/bin/wterm
        3  0.1410 /uhci_hcd
        3  0.1410 /usr/bin/grep
        2  0.0940 /usr/local/bin/ntpd
        2  0.0940 /usr/bin/gawk
        1  0.0470 /lib/libm-2.3.3.so
        1  0.0470 /tulip
        1  0.0470 /usr/bin/seq

sleep 30:

CPU_CLK_UNHALT...|
  samples|      %|
------------------
     2138 77.6890 /usr/src/linux/vmlinux
      341 12.3910 /bin/bash
       75  2.7253 /lib/ld-2.3.3.so
       68  2.4709 /lib/libc-2.3.3.so
       54  1.9622 /usr/X11R6/bin/XFree86
       21  0.7631 /usr/X11R6/lib/libX11.so.6.2
       12  0.4360 /usr/bin/oprofiled
        9  0.3270 /usr/local/bin/wmaker
        9  0.3270 /oprofile
        6  0.2180 /uhci_hcd
        4  0.1453 /usr/local/bin/wterm
        4  0.1453 /tulip
        3  0.1090 /usr/bin/grep
        2  0.0727 /usr/bin/op_help
        1  0.0363 /usr/X11R6/lib/libXt.so.6.0
        1  0.0363 /usr/local/bin/ntpd
        1  0.0363 /usr/local/bin/xscreensaver
        1  0.0363 /usr/local/lib/libpopt.so.0.0.0
        1  0.0363 /usr/bin/gawk
        1  0.0363 /usr/sbin/syslogd

sleep 40:

CPU_CLK_UNHALT...|
  samples|      %|
------------------
     2806 81.3805 /usr/src/linux/vmlinux
      333  9.6578 /bin/bash
       70  2.0302 /lib/ld-2.3.3.so
       65  1.8852 /usr/X11R6/bin/XFree86
       63  1.8271 /lib/libc-2.3.3.so
       40  1.1601 /usr/X11R6/lib/libX11.so.6.2
       21  0.6090 /usr/bin/oprofiled
       14  0.4060 /usr/local/bin/wmaker
        9  0.2610 /oprofile
        5  0.1450 /usr/bin/gawk
        4  0.1160 /usr/bin/grep
        3  0.0870 /usr/local/bin/wterm
        3  0.0870 /tulip
        2  0.0580 /uhci_hcd
        2  0.0580 /usr/bin/op_help
        2  0.0580 /usr/sbin/syslogd
        1  0.0290 /bin/cat
        1  0.0290 /usr/X11R6/lib/libXt.so.6.0
        1  0.0290 /usr/local/bin/ntpd
        1  0.0290 /usr/local/bin/xemacs-21.5.16
        1  0.0290 /usr/bin/tr
        1  0.0290 /usr/sbin/nscd

as expected, the  binaries that have increasing samples are
the kernel, XFree86 and libX11 and some other that are of a lower 
order. 

we note that /bin/bash uses approximately 338 samples
the kernel uses approximately: 61 t + 327 samples, t being time expressed in seconds
XFree86: 1.5 t + 9.5
libc: points are too much scattered.

XDrawString
-----------

date; opcontrol --start; ./a.out x 600 ;opcontrol --shutdown; date;
Mon Dec 29 22:42:33 CET 2003
Using 2.6+ OProfile kernel interface.
Reading module info.
Using log file /var/lib/oprofile/oprofiled.log
Daemon started.
Profiler running.
Start 71
gtk2: 0
x:    23
Stopping profiling.
Killing daemon.
Mon Dec 29 22:42:37 CET 2003

oprofiled 4 seconds, drawing test lasted 0.23s

CPU_CLK_UNHALT...|
  samples|      %|
------------------
     1154 26.8935 /lib/libc-2.3.3.so
      933 21.7432 /usr/lib/libfontconfig.so.1.0.4
      905 21.0907 /usr/src/linux/vmlinux
      338  7.8770 /bin/bash
      265  6.1757 /usr/X11R6/bin/XFree86
      244  5.6863 /lib/ld-2.3.3.so
      116  2.7033 /usr/X11R6/lib/libX11.so.6.2
       51  1.1885 /usr/local/lib/libwraster.so.3.0.0
       42  0.9788 /usr/local/lib/libgdk-x11-2.0.so.0.300.0
       42  0.9788 /usr/local/lib/libgobject-2.0.so.0.300.0
       38  0.8856 /usr/local/lib/libglib-2.0.so.0.300.0
       36  0.8390 /home/mhadasht/tmp/a.out
       25  0.5826 /usr/local/bin/wmaker
       22  0.5127 /usr/bin/oprofiled
       18  0.4195 /oprofile
       11  0.2564 /usr/local/bin/wterm
        9  0.2097 /usr/local/lib/libexpat.so.0.4.0
        7  0.1631 /usr/local/lib/libgtk-x11-2.0.so.0.300.0
        7  0.1631 /usr/lib/libfreetype.so.6.3.4
        4  0.0932 /usr/bin/op_help
        3  0.0699 /usr/X11R6/lib/libXt.so.6.0
        3  0.0699 /usr/local/lib/libpango-1.0.so.0.300.1
        3  0.0699 /tulip
        3  0.0699 /usr/bin/gawk
        2  0.0466 /usr/X11R6/lib/libXft.so.2.1.1
        2  0.0466 /usr/X11R6/lib/libXrender.so.1.2.2
        2  0.0466 /usr/bin/grep
        1  0.0233 /bin/cat
        1  0.0233 /usr/local/lib/libatk-1.0.so.0.304.0
        1  0.0233 /usr/local/lib/libpangoft2-1.0.so.0.300.1
        1  0.0233 /usr/local/lib/libpangoxft-1.0.so.0.300.1
        1  0.0233 /usr/local/lib/libstdc++.so.5.0.5
        1  0.0233 /lib/libdl-2.3.3.so

a run for system noise test shows that for a 3 seconds sleep, libc uses 38 samples. kernel uses 905 samples, 
that's 359 more, ie 1.77 times more.
This test is done to allow for a qualitative comparison vs gtk2 (which libraries
are used) and to see the influence of the sprintf() function call.


libc 27% of total time. Top 10 functions:

samples  %        symbol name
207      17.5127  _IO_vfprintf_internal
176      14.8900  index
124      10.4907  _IO_default_xsputn_internal
79        6.6836  strlen
67        5.6684  strcmp
64        5.4146  _int_malloc
46        3.8917  memcpy
34        2.8765  memmove
33        2.7919  __find_specmb
27        2.2843  __GI___strtol_internal

libfontconfig 22%. Top 10 functions:
samples  %        symbol name
120      12.9032  FcNameParse
105      11.2903  FcCacheReadString
85        9.1398  FcLangSetIndex
77        8.2796  FcNameParseCharSet
46        4.9462  FcStrCmpIgnoreCase
44        4.7312  FcPatternFreeze
38        4.0860  anonymous symbol from section .plt
30        3.2258  FcNameGetObjectType
24        2.5806  FcPatternPosition
23        2.4731  FcNameGetConstant

GTK2
----

opcontrol --reset;date; opcontrol --start; ./a.out gtk2 600 ;opcontrol --shutdown; date;
Mon Dec 29 23:00:28 CET 2003
Using 2.6+ OProfile kernel interface.
Reading module info.
Using log file /var/lib/oprofile/oprofiled.log
Daemon started.
Profiler running.
Start 72
gtk2: 1856
x:    0
Stopping profiling.
Killing daemon.
Mon Dec 29 23:00:51 CET 2003

oprofiled: 23s, drawing test lasted 18.56s


CPU_CLK_UNHALT...|
  samples|      %|
------------------
    14871 26.9960 /usr/local/lib/libpango-1.0.so.0.300.1
     9806 17.8013 /lib/libc-2.3.3.so
     6440 11.6908 /usr/local/lib/libglib-2.0.so.0.300.0
     6280 11.4004 /usr/X11R6/lib/libXft.so.2.1.1
     3497  6.3483 /usr/local/lib/libpangoxft-1.0.so.0.300.1
     3142  5.7038 /usr/local/lib/libgobject-2.0.so.0.300.0
     2591  4.7036 /usr/src/linux/vmlinux
     2031  3.6870 /usr/local/lib/libgdk-x11-2.0.so.0.300.0
     1642  2.9808 /usr/X11R6/bin/XFree86
     1274  2.3127 /usr/local/lib/libpangoft2-1.0.so.0.300.1
     1032  1.8734 /usr/lib/libfontconfig.so.1.0.4
      610  1.1074 /usr/local/lib/pango/1.4.0/modules/pango-basic-fc.so
      507  0.9204 /usr/X11R6/lib/libXrender.so.1.2.2
      340  0.6172 /bin/bash
      255  0.4629 /lib/ld-2.3.3.so
      242  0.4393 /usr/bin/oprofiled
      138  0.2505 /oprofile
      137  0.2487 /home/mhadasht/tmp/a.out
       77  0.1398 /usr/X11R6/lib/libX11.so.6.2
       53  0.0962 /usr/local/lib/libwraster.so.3.0.0
       31  0.0563 /usr/local/bin/wmaker
       26  0.0472 /usr/lib/libz.so.1.1.4
       12  0.0218 /usr/lib/libfreetype.so.6.3.4

Compared to XDrawString, libfontconfig overhead is approximately the
same, and is used for ~0.5s.  

libc usage increased 9 times compared to XDrawString

(kernel samples increased by 1044 samples (2591 vs 1547 at rest), which 
means an increase by a factor of 1.67. )

Most used libraries:
libpango is the most used library.
libc usage increased much more
libglib is very much used
libXft 
libpangoxft
libgobject

libpango 27%. Top functions:

samples  %        symbol name
1295      8.7082  pango_glyph_string_extents_range
1233      8.2913  pango_default_break
1117      7.5113  fribidi_analyse_string
666       4.4785  __i686.get_pc_thunk.bx
474       3.1874  pango_script_for_unichar
472       3.1740  anonymous symbol from section .plt
399       2.6831  pango_font_get_glyph_extents
377       2.5351  get_pair_index
375       2.5217  itemize_state_process_run
288       1.9367  itemize_state_init
- 10 functions
288       1.9367  pango_layout_line_get_extents
286       1.9232  pango_layout_check_lines
274       1.8425  pango_layout_get_extents_internal
269       1.8089  pango_script_iter_next
- 50%

Most of the functions deal with string analysis.

There is not a single pango function that stands out and uses most cpu.
Thus pango optimization would mean many small optimization in many of its
string analysis functions.


libc 18%. Top 10 functions:

samples  %        symbol name
2992     30.5119  _int_malloc
1022     10.4222  __cfree
958       9.7695  __malloc
897       9.1475  _IO_vfprintf_internal
-- 50%
844       8.6070  _int_free
462       4.7114  malloc_consolidate
385       3.9262  __i686.get_pc_thunk.bx
371       3.7834  memcpy
285       2.9064  strlen
206       2.1008  __calloc
-- 75% 

there are many memory allocation operations that are done internally. There
are also more _IO_vfprintf_internal calls compared to XDrawString
~65% of libc usage is for memory handling
~10% in string creation (printf)
~6% in memory copy or string length calculation


libglib 12%. Top 10 functions
samples  %        symbol name
673      10.4503  g_hash_table_lookup
477       7.4068  g_utf8_get_char
437       6.7857  g_malloc
408       6.3354  g_utf8_strlen
355       5.5124  g_free
315       4.8913  _g_slist_alloc
298       4.6273  g_utf8_validate
291       4.5186  g_slist_free
-- 50%
286       4.4410  g_unichar_type
239       3.7112  g_utf8_to_ucs4_fast
-- 58%

~26% is spent in unicode handling: get_char, strlen, validate, type,
        and conversion from utf8 to ucs4 (why the latter?)
~10% in hash table lookup
~30% in memory allocation stuff

libXft 11%. Top 10 functions:
samples  %        symbol name
2632     41.9108  XftGlyphExtents
1214     19.3312  XftGlyphSpecRender
-- ~60%
748      11.9108  XftFontCheckGlyph
418       6.6561  XftCharIndex
259       4.1242  XftDrawSrcPicture
-- ~80%
241       3.8376  __i686.get_pc_thunk.bx
189       3.0096  _XftSetFace
140       2.2293  _XftDisplayInfoGet
-- ~90%
123       1.9586  XftLockFace
98        1.5605  XftDrawGlyphSpec

glyph analysis accounts for 62%, while drawing accounts for 20 + 4 + 2 = 26%

Conclusion:
-----------

There is not a specific function that seems to be really inefficient.
Thus, to optimize pango, many layers and functions may need investigation and optimization:

- optimize global pango string analysis functions. This is difficult:
many function would need local optimizations or the structure of the
underlying algorithm should be optimized, if possible.

-  decrease number of memory allocation/deallocation, if possible

- optimize Xft glyph analysis

It should be noted that it is not the drawing that consumes most CPU
time but data analysis and handling.

The difference between XDrawString and gtk2 is mainly the very good unicode support, which introduces a great 
deal of complexity, due to the eventual use of different languages, fonts, etc.


Cheers

-- 
Melvin Hadasht



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]