Re: high-level text drawing
- From: Melvin Hadasht <melvin hadasht free fr>
- To: gtk-app-devel-list gnome org
- Subject: Re: high-level text drawing
- Date: Tue, 30 Dec 2003 00:36:17 +0100
Hi,
The slowness may be explained by the following points:
- pango xft uses antialiasing => drawing a string need the knowledge of the
underlying pixel => overhead.
- you use an exported X window (non local Xserver) => the data exchange due
to the previous point is slow due to the network link (on a local machine, I did
not see any difference in my tests due to antialiasing). Exceed may not support XRENDER extension.
I did some profiling work, and my conclusion is: unicode string analysis and
the related memory allocation increase is the primary cause of slowness.
Drawing is, at least on a local server, not the problem.
- pango text is UTF8 with potential right-to-left and left-to-right text, so text
analysis and glyph lookup is inherently more complex than in xdrawstring.
This seems to be confirmed by the profiling.
- font rasterisation:
During the test, I used the default fontconfig font "Sans", which is an OpenType
font on my box. The X core font was a bitmapped adobe-courier font. But it
looks like rasterisation is not the bottleneck, on my system.
I used oprofile to profile my system. First, I profiled it under no load, for
different duration (10, 20, 30 and 40s) to see how the kernel, libc, X behave.
Then I tested the XDrawString, and finally the gtk2 drawing.
I used a modified test program, that allows to select the test (x or gtk2) at
runtime.
System:
XFree86 4.3.99, ATI Radeon All-In-Wonder 32Mb
Linux 2.6.0-mm1
glibc 2.3.3 (cvs) + NPTL
PII-266
RAM 128Mb
Window manager: WindowMaker
gtk2 suite from CVS (november)
System noise (or system profile under no specific load):
Profiling the system without any interactive or batch process to monitor
kernel, libc and X noise.
The system is profiled for 10 seconds:
date; opcontrol --start; sleep 10 ;opcontrol --shutdown; date;
opreport --long-filenames
sleep 10:
CPU: PII, speed 267.28 MHz (estimated)
Counted CPU_CLK_UNHALTED events (clocks processor is not halted) with a unit mask of 0x00 (No unit mask)
count 100000
CPU_CLK_UNHALT...|
samples| %|
------------------
953 64.4790 /usr/src/linux/vmlinux
340 23.0041 /bin/bash
76 5.1421 /lib/ld-2.3.3.so
46 3.1123 /lib/libc-2.3.3.so
21 1.4208 /usr/X11R6/bin/XFree86
11 0.7442 /usr/X11R6/lib/libX11.so.6.2
7 0.4736 /usr/bin/grep
7 0.4736 /usr/bin/oprofiled
4 0.2706 /usr/local/bin/wmaker
4 0.2706 /oprofile
4 0.2706 /usr/bin/op_help
2 0.1353 /usr/local/bin/wterm
1 0.0677 /usr/X11R6/lib/libXt.so.6.0
1 0.0677 /usr/local/lib/libgcc_s.so.1
1 0.0677 /usr/bin/gawk
sleep 20:
CPU_CLK_UNHALT...|
samples| %|
------------------
1555 73.1077 /usr/src/linux/vmlinux
338 15.8909 /bin/bash
71 3.3380 /lib/ld-2.3.3.so
57 2.6798 /lib/libc-2.3.3.so
42 1.9746 /usr/X11R6/bin/XFree86
14 0.6582 /usr/X11R6/lib/libX11.so.6.2
12 0.5642 /usr/local/bin/wmaker
9 0.4231 /usr/bin/oprofiled
7 0.3291 /oprofile
5 0.2351 /usr/bin/op_help
4 0.1881 /usr/local/bin/wterm
3 0.1410 /uhci_hcd
3 0.1410 /usr/bin/grep
2 0.0940 /usr/local/bin/ntpd
2 0.0940 /usr/bin/gawk
1 0.0470 /lib/libm-2.3.3.so
1 0.0470 /tulip
1 0.0470 /usr/bin/seq
sleep 30:
CPU_CLK_UNHALT...|
samples| %|
------------------
2138 77.6890 /usr/src/linux/vmlinux
341 12.3910 /bin/bash
75 2.7253 /lib/ld-2.3.3.so
68 2.4709 /lib/libc-2.3.3.so
54 1.9622 /usr/X11R6/bin/XFree86
21 0.7631 /usr/X11R6/lib/libX11.so.6.2
12 0.4360 /usr/bin/oprofiled
9 0.3270 /usr/local/bin/wmaker
9 0.3270 /oprofile
6 0.2180 /uhci_hcd
4 0.1453 /usr/local/bin/wterm
4 0.1453 /tulip
3 0.1090 /usr/bin/grep
2 0.0727 /usr/bin/op_help
1 0.0363 /usr/X11R6/lib/libXt.so.6.0
1 0.0363 /usr/local/bin/ntpd
1 0.0363 /usr/local/bin/xscreensaver
1 0.0363 /usr/local/lib/libpopt.so.0.0.0
1 0.0363 /usr/bin/gawk
1 0.0363 /usr/sbin/syslogd
sleep 40:
CPU_CLK_UNHALT...|
samples| %|
------------------
2806 81.3805 /usr/src/linux/vmlinux
333 9.6578 /bin/bash
70 2.0302 /lib/ld-2.3.3.so
65 1.8852 /usr/X11R6/bin/XFree86
63 1.8271 /lib/libc-2.3.3.so
40 1.1601 /usr/X11R6/lib/libX11.so.6.2
21 0.6090 /usr/bin/oprofiled
14 0.4060 /usr/local/bin/wmaker
9 0.2610 /oprofile
5 0.1450 /usr/bin/gawk
4 0.1160 /usr/bin/grep
3 0.0870 /usr/local/bin/wterm
3 0.0870 /tulip
2 0.0580 /uhci_hcd
2 0.0580 /usr/bin/op_help
2 0.0580 /usr/sbin/syslogd
1 0.0290 /bin/cat
1 0.0290 /usr/X11R6/lib/libXt.so.6.0
1 0.0290 /usr/local/bin/ntpd
1 0.0290 /usr/local/bin/xemacs-21.5.16
1 0.0290 /usr/bin/tr
1 0.0290 /usr/sbin/nscd
as expected, the binaries that have increasing samples are
the kernel, XFree86 and libX11 and some other that are of a lower
order.
we note that /bin/bash uses approximately 338 samples
the kernel uses approximately: 61 t + 327 samples, t being time expressed in seconds
XFree86: 1.5 t + 9.5
libc: points are too much scattered.
XDrawString
-----------
date; opcontrol --start; ./a.out x 600 ;opcontrol --shutdown; date;
Mon Dec 29 22:42:33 CET 2003
Using 2.6+ OProfile kernel interface.
Reading module info.
Using log file /var/lib/oprofile/oprofiled.log
Daemon started.
Profiler running.
Start 71
gtk2: 0
x: 23
Stopping profiling.
Killing daemon.
Mon Dec 29 22:42:37 CET 2003
oprofiled 4 seconds, drawing test lasted 0.23s
CPU_CLK_UNHALT...|
samples| %|
------------------
1154 26.8935 /lib/libc-2.3.3.so
933 21.7432 /usr/lib/libfontconfig.so.1.0.4
905 21.0907 /usr/src/linux/vmlinux
338 7.8770 /bin/bash
265 6.1757 /usr/X11R6/bin/XFree86
244 5.6863 /lib/ld-2.3.3.so
116 2.7033 /usr/X11R6/lib/libX11.so.6.2
51 1.1885 /usr/local/lib/libwraster.so.3.0.0
42 0.9788 /usr/local/lib/libgdk-x11-2.0.so.0.300.0
42 0.9788 /usr/local/lib/libgobject-2.0.so.0.300.0
38 0.8856 /usr/local/lib/libglib-2.0.so.0.300.0
36 0.8390 /home/mhadasht/tmp/a.out
25 0.5826 /usr/local/bin/wmaker
22 0.5127 /usr/bin/oprofiled
18 0.4195 /oprofile
11 0.2564 /usr/local/bin/wterm
9 0.2097 /usr/local/lib/libexpat.so.0.4.0
7 0.1631 /usr/local/lib/libgtk-x11-2.0.so.0.300.0
7 0.1631 /usr/lib/libfreetype.so.6.3.4
4 0.0932 /usr/bin/op_help
3 0.0699 /usr/X11R6/lib/libXt.so.6.0
3 0.0699 /usr/local/lib/libpango-1.0.so.0.300.1
3 0.0699 /tulip
3 0.0699 /usr/bin/gawk
2 0.0466 /usr/X11R6/lib/libXft.so.2.1.1
2 0.0466 /usr/X11R6/lib/libXrender.so.1.2.2
2 0.0466 /usr/bin/grep
1 0.0233 /bin/cat
1 0.0233 /usr/local/lib/libatk-1.0.so.0.304.0
1 0.0233 /usr/local/lib/libpangoft2-1.0.so.0.300.1
1 0.0233 /usr/local/lib/libpangoxft-1.0.so.0.300.1
1 0.0233 /usr/local/lib/libstdc++.so.5.0.5
1 0.0233 /lib/libdl-2.3.3.so
a run for system noise test shows that for a 3 seconds sleep, libc uses 38 samples. kernel uses 905 samples,
that's 359 more, ie 1.77 times more.
This test is done to allow for a qualitative comparison vs gtk2 (which libraries
are used) and to see the influence of the sprintf() function call.
libc 27% of total time. Top 10 functions:
samples % symbol name
207 17.5127 _IO_vfprintf_internal
176 14.8900 index
124 10.4907 _IO_default_xsputn_internal
79 6.6836 strlen
67 5.6684 strcmp
64 5.4146 _int_malloc
46 3.8917 memcpy
34 2.8765 memmove
33 2.7919 __find_specmb
27 2.2843 __GI___strtol_internal
libfontconfig 22%. Top 10 functions:
samples % symbol name
120 12.9032 FcNameParse
105 11.2903 FcCacheReadString
85 9.1398 FcLangSetIndex
77 8.2796 FcNameParseCharSet
46 4.9462 FcStrCmpIgnoreCase
44 4.7312 FcPatternFreeze
38 4.0860 anonymous symbol from section .plt
30 3.2258 FcNameGetObjectType
24 2.5806 FcPatternPosition
23 2.4731 FcNameGetConstant
GTK2
----
opcontrol --reset;date; opcontrol --start; ./a.out gtk2 600 ;opcontrol --shutdown; date;
Mon Dec 29 23:00:28 CET 2003
Using 2.6+ OProfile kernel interface.
Reading module info.
Using log file /var/lib/oprofile/oprofiled.log
Daemon started.
Profiler running.
Start 72
gtk2: 1856
x: 0
Stopping profiling.
Killing daemon.
Mon Dec 29 23:00:51 CET 2003
oprofiled: 23s, drawing test lasted 18.56s
CPU_CLK_UNHALT...|
samples| %|
------------------
14871 26.9960 /usr/local/lib/libpango-1.0.so.0.300.1
9806 17.8013 /lib/libc-2.3.3.so
6440 11.6908 /usr/local/lib/libglib-2.0.so.0.300.0
6280 11.4004 /usr/X11R6/lib/libXft.so.2.1.1
3497 6.3483 /usr/local/lib/libpangoxft-1.0.so.0.300.1
3142 5.7038 /usr/local/lib/libgobject-2.0.so.0.300.0
2591 4.7036 /usr/src/linux/vmlinux
2031 3.6870 /usr/local/lib/libgdk-x11-2.0.so.0.300.0
1642 2.9808 /usr/X11R6/bin/XFree86
1274 2.3127 /usr/local/lib/libpangoft2-1.0.so.0.300.1
1032 1.8734 /usr/lib/libfontconfig.so.1.0.4
610 1.1074 /usr/local/lib/pango/1.4.0/modules/pango-basic-fc.so
507 0.9204 /usr/X11R6/lib/libXrender.so.1.2.2
340 0.6172 /bin/bash
255 0.4629 /lib/ld-2.3.3.so
242 0.4393 /usr/bin/oprofiled
138 0.2505 /oprofile
137 0.2487 /home/mhadasht/tmp/a.out
77 0.1398 /usr/X11R6/lib/libX11.so.6.2
53 0.0962 /usr/local/lib/libwraster.so.3.0.0
31 0.0563 /usr/local/bin/wmaker
26 0.0472 /usr/lib/libz.so.1.1.4
12 0.0218 /usr/lib/libfreetype.so.6.3.4
Compared to XDrawString, libfontconfig overhead is approximately the
same, and is used for ~0.5s.
libc usage increased 9 times compared to XDrawString
(kernel samples increased by 1044 samples (2591 vs 1547 at rest), which
means an increase by a factor of 1.67. )
Most used libraries:
libpango is the most used library.
libc usage increased much more
libglib is very much used
libXft
libpangoxft
libgobject
libpango 27%. Top functions:
samples % symbol name
1295 8.7082 pango_glyph_string_extents_range
1233 8.2913 pango_default_break
1117 7.5113 fribidi_analyse_string
666 4.4785 __i686.get_pc_thunk.bx
474 3.1874 pango_script_for_unichar
472 3.1740 anonymous symbol from section .plt
399 2.6831 pango_font_get_glyph_extents
377 2.5351 get_pair_index
375 2.5217 itemize_state_process_run
288 1.9367 itemize_state_init
- 10 functions
288 1.9367 pango_layout_line_get_extents
286 1.9232 pango_layout_check_lines
274 1.8425 pango_layout_get_extents_internal
269 1.8089 pango_script_iter_next
- 50%
Most of the functions deal with string analysis.
There is not a single pango function that stands out and uses most cpu.
Thus pango optimization would mean many small optimization in many of its
string analysis functions.
libc 18%. Top 10 functions:
samples % symbol name
2992 30.5119 _int_malloc
1022 10.4222 __cfree
958 9.7695 __malloc
897 9.1475 _IO_vfprintf_internal
-- 50%
844 8.6070 _int_free
462 4.7114 malloc_consolidate
385 3.9262 __i686.get_pc_thunk.bx
371 3.7834 memcpy
285 2.9064 strlen
206 2.1008 __calloc
-- 75%
there are many memory allocation operations that are done internally. There
are also more _IO_vfprintf_internal calls compared to XDrawString
~65% of libc usage is for memory handling
~10% in string creation (printf)
~6% in memory copy or string length calculation
libglib 12%. Top 10 functions
samples % symbol name
673 10.4503 g_hash_table_lookup
477 7.4068 g_utf8_get_char
437 6.7857 g_malloc
408 6.3354 g_utf8_strlen
355 5.5124 g_free
315 4.8913 _g_slist_alloc
298 4.6273 g_utf8_validate
291 4.5186 g_slist_free
-- 50%
286 4.4410 g_unichar_type
239 3.7112 g_utf8_to_ucs4_fast
-- 58%
~26% is spent in unicode handling: get_char, strlen, validate, type,
and conversion from utf8 to ucs4 (why the latter?)
~10% in hash table lookup
~30% in memory allocation stuff
libXft 11%. Top 10 functions:
samples % symbol name
2632 41.9108 XftGlyphExtents
1214 19.3312 XftGlyphSpecRender
-- ~60%
748 11.9108 XftFontCheckGlyph
418 6.6561 XftCharIndex
259 4.1242 XftDrawSrcPicture
-- ~80%
241 3.8376 __i686.get_pc_thunk.bx
189 3.0096 _XftSetFace
140 2.2293 _XftDisplayInfoGet
-- ~90%
123 1.9586 XftLockFace
98 1.5605 XftDrawGlyphSpec
glyph analysis accounts for 62%, while drawing accounts for 20 + 4 + 2 = 26%
Conclusion:
-----------
There is not a specific function that seems to be really inefficient.
Thus, to optimize pango, many layers and functions may need investigation and optimization:
- optimize global pango string analysis functions. This is difficult:
many function would need local optimizations or the structure of the
underlying algorithm should be optimized, if possible.
- decrease number of memory allocation/deallocation, if possible
- optimize Xft glyph analysis
It should be noted that it is not the drawing that consumes most CPU
time but data analysis and handling.
The difference between XDrawString and gtk2 is mainly the very good unicode support, which introduces a great
deal of complexity, due to the eventual use of different languages, fonts, etc.
Cheers
--
Melvin Hadasht
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]