[gnome-devel-docs] optimization-guide: Conversion from DocBook to Mallard



commit 9ba0d959b739eb59c2f7bd5a26906c4735481c87
Author: Lavanya <lavanyagunasekar gmail com>
Date:   Tue Aug 6 17:55:37 2013 +0200

    optimization-guide: Conversion from DocBook to Mallard

 .../C/{optimization-harmful.xml => harmful.page}   |  111 ++++----
 optimization-guide/C/{index.docbook => index.page} |   36 ++-
 .../{optimization-intro.xml => introduction.page}  |  280 ++++++++++----------
 optimization-guide/C/massif.page                   |  180 +++++++++++++
 optimization-guide/C/optimization-massif.xml       |  171 ------------
 5 files changed, 403 insertions(+), 375 deletions(-)
---
diff --git a/optimization-guide/C/optimization-harmful.xml b/optimization-guide/C/harmful.page
similarity index 61%
rename from optimization-guide/C/optimization-harmful.xml
rename to optimization-guide/C/harmful.page
index 6cfd23f..55c55ed 100644
--- a/optimization-guide/C/optimization-harmful.xml
+++ b/optimization-guide/C/harmful.page
@@ -1,68 +1,71 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<chapter>
+<page xmlns="http://projectmallard.org/1.0/";
+      type="guide" style="task"
+      id="harmful">
+    <info>
+     <link type="guide" xref="index#harm"/>
+    </info>
     <title>Disk Seeks Considered Harmful</title>
-
-    <para>
+    <p>
         Disk seeks are one of the most expensive operations you can possibly perform. You might not know 
this from looking at how many of them we perform, but trust me, they are. Consequently, please refrain from 
the following suboptimal behavior:
-    </para>
-    <itemizedlist>
-        <listitem>
-            <para>
+    </p>
+    <list type="unordered">
+        <item>
+            <p>
                 Placing lots of small files all over the disk.
-            </para>
-        </listitem>
-        <listitem>
-            <para>
+            </p>
+        </item>
+        <item>
+            <p>
                 Opening, stating, and reading lots of files all over the disk
-            </para>
-        </listitem>
-        <listitem>
-            <para>
+            </p>
+        </item>
+        <item>
+            <p>
                 Doing the above on files that are laid out at different times, so as to ensure that they are 
fragmented and cause even more seeking.
-            </para>
-        </listitem>
-        <listitem>
-            <para>
+            </p>
+        </item>
+        <item>
+            <p>
                 Doing the above on files that are in different directories, so as to ensure that they are in 
different cylinder groups and and cause even more seeking.
-            </para>
-        </listitem>
-        <listitem>
-            <para>
+            </p>
+        </item>
+        <item>
+            <p>
                 Repeatedly doing the above when it only needs to be done once.
-            </para>
-        </listitem>
-    </itemizedlist>
-    <para>
+            </p>
+        </item>
+    </list>
+    <p>
         Ways in which you can optimize your code to be seek-friendly:
-    </para>
-    <itemizedlist>
-        <listitem>
-            <para>
+    </p>
+    <list type="unordered">
+        <item>
+            <p>
                 Consolidate data into a single file.
-            </para>
-        </listitem>
-        <listitem>
-            <para>
+            </p>
+        </item>
+        <item>
+            <p>
                 Keep data together in the same directory.
-            </para>
-        </listitem>
-        <listitem>
-            <para>
+            </p>
+        </item>
+        <item>
+            <p>
                 Cache data so as to not need to reread constantly.
-            </para>
-        </listitem>
-        <listitem>
-            <para>
+            </p>
+        </item>
+        <item>
+            <p>
                 Share data so as not to have to reread it from disk when each application loads.
-            </para>
-        </listitem>
-        <listitem>
-            <para>
+            </p>
+        </item>
+        <item>
+            <p>
                 Consider caching all of the data in a single binary file that is properly aligned and can be 
mmaped.
-            </para>
-        </listitem>
-    </itemizedlist>
-    <para>
+            </p>
+        </item>
+    </list>
+    <p>
         The trouble with disk seeks are compounded for reads, which is unfortunately what we are doing.  
Remember, reads are generally synchronous while writes are asynchronous.  This only compounds the problem, 
serializing each read, and contributing to program latency.
-    </para>
-</chapter>
+    </p>
+</page>        
diff --git a/optimization-guide/C/index.docbook b/optimization-guide/C/index.page
similarity index 67%
rename from optimization-guide/C/index.docbook
rename to optimization-guide/C/index.page
index 6226c52..e5d1d80 100644
--- a/optimization-guide/C/index.docbook
+++ b/optimization-guide/C/index.page
@@ -1,9 +1,7 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
- "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd";>
-<book id="index">
-    <title>Optimizing GNOME Software</title>
-    <bookinfo>
+<page xmlns="http://projectmallard.org/1.0/";
+      type="guide" style="task"
+      id="index">
+  <info>
       <publisher role="maintainer">
         <publishername>GNOME Documentation Project</publishername>
       </publisher>
@@ -24,7 +22,7 @@
         <firstname>Robert</firstname>
         <surname>Love</surname>
       </author>
-      
+
       <legalnotice>
         <para>
             Permission is granted to copy, distribute and/or modify this
@@ -46,7 +44,7 @@
             or initial caps.
         </para>
       </legalnotice>
-      
+
       <revhistory>
           <revision>
               <revnumber>0.1</revnumber>
@@ -65,11 +63,19 @@
               This section contains guides and tutorials for optimizing your software.
           </para>
       </abstract>
+  </info>
+  <title>Optimization Guide</title>
+  <section id="intro" style="2column">
+     <title>Introduction</title>
+     <p>This is a brief introduction to optimization, both the hows and the whys. Details of individual 
tools and techniques are left for later articles, but a collection of hints and tricks is provided.</p>
+  </section>
+  <section id="massif" style="2column">
+     <title>Massif</title>
+     <p> This article describes how to use the <application>Massif</application> heap profiler with GNOME 
applications. We describe how to invoke, interpret, and act on the output of 
<application>Massif</application>. The <application>Swell Foop</application> game is used as an example.</p>
+  </section>
+  <section id="harm" style="2column">
+     <title>Harmfulness</title>
+     <p>Disk seeks are one of the most expensive operations you can possibly perform. You might not know 
this from looking at how many of them we perform, but trust me, they are. Consequently, please refrain from 
the following suboptimal behavior:</p>
+  </section>
 
-    </bookinfo>
-    
-    <include href="optimization-intro.xml" xmlns="http://www.w3.org/2001/XInclude"; />
-    <include href="optimization-massif.xml" xmlns="http://www.w3.org/2001/XInclude"; />
-    <include href="optimization-harmful.xml" xmlns="http://www.w3.org/2001/XInclude"; />
-    
-</book>
+</page>
diff --git a/optimization-guide/C/optimization-intro.xml b/optimization-guide/C/introduction.page
similarity index 69%
rename from optimization-guide/C/optimization-intro.xml
rename to optimization-guide/C/introduction.page
index 3a0e12c..ffaef2e 100644
--- a/optimization-guide/C/optimization-intro.xml
+++ b/optimization-guide/C/introduction.page
@@ -1,165 +1,175 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<chapter>
-    <title>The Quick Guide to Optimizing GNOME Programs</title>
-    
-    <para>
-        This is a brief introduction to optimization, both the hows and the whys. Details of individual 
tools and techniques are left for later articles, but a collection of hints and tricks is provided.
-    </para>
-    
-    <sect1 id="optimization-intro-TBL-what-are-we-optimizing">
-        <title>What are we Optimizing?</title>
-        <para>
-            When we optimize for GNOME the first thing to remember is this: we are not trying to make the 
program better, we are trying to make the person using the computer happier.
-        </para>
-        <para>
+<page xmlns="http://projectmallard.org/1.0/";
+      type="guide" style="task"
+      id="introduction">
+   <info>
+     <link type="guide" xref="index#intro"/>
+   </info>
+         <title>What are we Optimizing?</title>
+        <p>
+            When we optimize for GNOME the first thing to remember is this: we are not trying to make the 
program better, we are trying to make the person using the computer happier.</p>
+        <p>
             Better programs make people happier, but there are some improvements that will make them a lot 
happier than others: Responsiveness, start-up time, easy to access commands and not having the computer go 
into swap the moment more than two programs are open.
-        </para>
-        <para>
+        </p>
+        <p>
             Traditional optimization tackles concepts like CPU use, code size, the number of mouse clicks 
and the memory use of the program. This second list has been chosen to correlate with the first list, however 
there is an important difference: The person using GNOME doesn't care about the second list, but they care a 
lot about the first list. When optimizing GNOME programs we will reduce CPU use, memory use and all those 
things, but these are the means to the end, not the final goal. We are optimizing for people.
-        </para>
-    </sect1>
-    
-    <sect1 id="optimization-intro-TBL-doing-optimization">
-        <title>Doing the Optimization</title>
-        <para>
+        </p>
+
+       <section id="doing-the-optimization">        
+          <title>Doing the Optimization</title>
+        <p>
             The previous section omitted one important qualifier: To optimize something it has to be 
measurable. You can't measure happiness. However, you can measure start-up time so you can tell if you have 
improved it. Happiness will then, hopefully, follow.
-        </para>
-        <para>
-            Optimization is the process of measurement, refinement and re-measurement. So the first thing 
you must do is find a way to measure what you are optimizing. Ideally this measurement is a single number, 
for example: the time taken to perform a task. This is your benchmark, it is the only way to tell if you are 
winning or losing. There is a big difference between a program that <emphasis>should</emphasis> be fast and a 
program that <emphasis>is</emphasis> fast.
-        </para>
-        <para>
+        </p>
+        <p>
+            Optimization is the process of measurement, refinement and re-measurement. So the first thing 
you must do is find a way to measure what you are optimizing. Ideally this measurement is a single number, 
for example: the time taken to perform a task. This is your benchmark, it is the only way to tell if you are 
winning or losing. There is a big difference between a program that <em>should</em> be fast and a program 
that <em>is</em> fast.
+        </p>
+        <p>
             Once you have a basic benchmark you need to find out why your code is not doing as well as it 
should. It is tempting to do this by inspection: just looking at the code and trying to spot something that 
looks like it needs improvement. You will invariably be wrong. Using a profiler to get a detailed break-down 
of what your program really does is the only way to be sure.
-        </para>
-        <para>
+        </p>
+        <p>
             Usually the problem is isolated to small sections of code. Pick the worst place and concentrate 
on that first. Once that is done, rerun the profiler and repeat. As you proceed the gains made at each step 
will get less and less, at some point you will have to decide that the results are good enough. If your 
efforts are only extracting 10% improvements then you are well past the point where you should have stopped.
-        </para>
-        <para>
+        </p>
+        <p>
             Don't forget the big picture. For example, rather than just trying to speed up a piece of code, 
ask yourself if it needs to be run at all. Could it be combined with another piece of code? Can the results 
of previous calculations be saved and reused? It won't even need to be optimized if it is in a place where 
the user is never going to notice it. Worse still, the code may already be optimized and is doing the heavy 
calculations now to avoid doing them later. Code does not run in isolation and neither does the optimization 
process.
-        </para>
-    </sect1>
-    <sect1 id="optimization-intro-TBL-hints">
+        </p>
+       </section>
+
+       <section id="hints">
         <title>Hints</title>
-        <itemizedlist>
+
+        <terms>
+          <item>
             <title>The Fundamentals</title>
-            <listitem>
-                <para>
+            <list type="ordered">
+            <item>
+                <p>
                     Re-run your benchmark after every change you make to the code and keep a log of 
everything you change and how it affects the benchmark. This lets you undo mistakes and also helps you not to 
repeat mistakes.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     Make sure your code is correct and bug-free before optimizing it. Check that it remains 
correct and bug-free after optimization.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     Optimize at the high level before optimizing the details.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     Use the right algorithm. The classic text-book example is using quick-sort instead of 
bubble-sort. There are many others, some save memory, some save CPU. Also, see what shortcuts you can make: 
you can do quicker than quick-sort if you are prepared to make some compromises.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     Optimization is a trade-off. Caching results speeds up calculations, but increases 
memory use. Saving data to disk saves memory, but costs time when it is loaded back from disk.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                      </p>
+            </item>
+            <item>
+                <p>
                     Make sure you choose a wide variety of inputs to optimize against. If you don't it is 
easy to end up with a piece of code carefully optimized for one file and no others.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     Avoid expensive operations: Multiple small disk reads. Using up lots of memory so disk 
swapping becomes necessary. Avoid anything that writes or reads from the hard disk unnecessarily. The network 
is slow too. Also avoid graphics operations that need a response from the X server.
-                </para>
-            </listitem>
-        </itemizedlist>
-        <itemizedlist>
+                </p>
+           </item>
+           </list>
+        </item>
+        <item>
             <title>Traps for the Unwary</title>
-            <listitem>
-                <para>
+            <list type="ordered">
+            <item>
+                <p>
                     Beware of side effects. There can often be strange interactions between different 
sections of code, a speed-up in one part can slow another part down.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     When timing code, even on a quiet system, events outside the program add noise to the 
timing results. Average over multiple runs. If the code is very short, timer resolution is also a problem. In 
this case measure the time the computer takes to run the code 100 or 1000 times. If the times you are 
recording are longer than a few seconds, you should be OK.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     It is very easy to be misled by the profiler. There are stories of people optimizing the 
operating system idle-loop because that is where it spent all its time! Don't optimize code that does nothing 
the user cares about.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     Remember the resources on the X server. Your program's memory usage doesn't include the 
pixmaps that are stored in the X server's process, but they are still using up memory. Use xrestop to see 
what resources your program is using.
-                </para>
-            </listitem>
-        </itemizedlist>
-        <itemizedlist>
-            <title>Low Level Hints</title>
-            <listitem>
-                <para>
+                </p>
+            </item>
+           </list>
+        </item>
+       <item>
+          <title>Low Level Hints</title>
+            <list type="ordered">
+            <item>
+                <p>
                     When optimizing memory use, be wary of the difference between peak usage and average 
memory usage. Some memory is almost always allocated, this is usually bad. Some is only briefly allocated, 
this may be quite acceptable. Tools like massif use the concept of space-time, the product of memory used and 
the duration it was allocated for, instead.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     Time simplified bits of code that do only the things you know are essential, this gives 
an absolute lower limit on the time your code will take. For example, when optimizing a loop time the empty 
loop. If that is still too long no amount of micro-optimization will help and you will have to change your 
design. Make sure the compiler doesn't optimize away your empty loop.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     Move code out from inside loops. A slightly more complicated piece of code that is 
executed once is far quicker than a simple piece of code executed a thousand times. Avoid calling slow code 
often.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
-                      Give the compiler as many hints as possible. Use the const keyword. Use 
<envar>G_INLINE_FUNC</envar> for short, frequently called, functions. Look up <envar>G_GNUC_PURE</envar>, 
<envar>G_LIKELY</envar> and the other glib miscellaneous macros. Use the macros instead of gcc-specific 
keywords to ensure portability.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
+                      Give the compiler as many hints as possible. Use the const keyword. Use 
<code>G_INLINE_FUNC</code> for short, frequently called, functions. Look up <code>G_GNUC_PURE</code>, 
<code>G_LIKELY</code> and the other glib miscellaneous macros. Use the macros instead of gcc-specific 
keywords to ensure portability.
+                </p>
+            </item>
+            <item>
+                <p>
                     Don't use assembly language. It is not portable and, while it may be fast on one 
processor, it is not even guaranteed to be fast on every processor that supports that architecture (e.g. 
Athlon vs. Pentium 4).
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     Don't rewrite an existing library routine unless you are sure it is unnecessarily slow. 
Many CPU-intensive library routines have already been optimized. Conversely, some library routines are slow, 
especially ones that make system calls to the operating system.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     Minimize the number of libraries you link to. The fewer libraries to link in, the faster 
the program starts. This is a difficult thing to do with GNOME.
-                </para>
-            </listitem>
-        </itemizedlist>
-        <itemizedlist>
-            <title>High Level Tricks</title>
-            <listitem>
-                <para>
+                </p>
+            </item>
+           </list>
+        </item>
+       <item>
+          <title>High Level Tricks</title>
+          <list type="ordered">
+            <item>
+                <p>
                     Take advantage of concurrency. This doesn't just mean using multiple processors, it also 
means taking advantage of the time the user spends thinking about what they are going to do next to perform 
some calculations in anticipation. Do calculations while waiting for data to be loaded off disk. Take 
advantage of multiple resources, use them all at once.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     Cheat. The user only has to think that the computer is fast, it doesn't matter whether 
it actually is or not. It is the time between the command and the answer that is important, it doesn't matter 
if the response is pre-calculated, cached, or will in fact be worked out later at a more convenient time, as 
long as the user gets what they expect.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     Do things in the idle loop. It is easier to program than using full multi-threading but 
still gets things done out of the users eye. Be careful though, if you spend too long in the idle loop your 
program will become sluggish. So regularly give control back to the main loop.
-                </para>
-            </listitem>
-            <listitem>
-                <para>
+                </p>
+            </item>
+            <item>
+                <p>
                     If all else fails, tell the user that the code is going to be slow and put up a progress 
bar. They won't be as happy as if you had just presented the results, but they will at least know the program 
hasn't crashed and they can go get a cup of coffee.
-                </para>
-            </listitem>
-        </itemizedlist>
-    </sect1>
-</chapter>
+                </p>
+            </item>
+         </list>
+        </item>
+      </terms>
+    </section>
+  </page>
+
+
+
diff --git a/optimization-guide/C/massif.page b/optimization-guide/C/massif.page
new file mode 100644
index 0000000..886aa85
--- /dev/null
+++ b/optimization-guide/C/massif.page
@@ -0,0 +1,180 @@
+<page xmlns="http://projectmallard.org/1.0/";
+      type="guide" style="task"
+      id="massif">
+    <info>
+      <link type="guide" xref="index#massif"/>
+    </info>
+      <title>Using <app>Massif</app> for Profiling Memory Use in GNOME Software</title>
+
+    <p>
+        This article describes how to use the <app>Massif</app> heap profiler with GNOME applications. We 
describe how to invoke, interpret, and act on the output of <app>Massif</app>. The <app>Swell Foop</app> game 
is used as an example. </p>
+   <section id="optimization-massif-TBL-intro">
+        <title>Introduction</title>
+        <p>
+            <app>Massif</app> is a member of the <link href="http://valgrind.org/";>valgrind</link> suite of 
memory-profiling tools. Its purpose is to give a detailed view of dynamic memory usage during the lifetime of 
the program. Specifically it records the memory use of the heap and the stack.
+        </p>
+        <p>
+            The heap is the region of memory which is allocated with functions like malloc. It grows on 
demand and is usually the largest region of memory in a program. The stack is where all the local data for 
functions is stored. This includes the "automatic" variables in C and the return address for subroutines. The 
stack is typically a lot smaller and a lot more active than the heap. We won't consider the stack explicitly 
since <app>Massif</app> treats it as though it were just another part of the heap. <app>Massif</app> also 
gives information about how much memory is used to manage the heap.  </p>
+        <p>
+            <app>Massif</app> produces two output files: a graphical overview in a postscript file and a 
detailed breakdown in a text file.
+        </p>
+    </section>
+    <section id="optimization-massif-TBL-using-massif">
+        <title>Using <app>Massif</app> with GNOME</title>
+        <p>
+          <app>Massif</app> has very few options and for many programs does not need them. However for GNOME 
applications, where memory allocation might be buried deep in either glib or GTK, the number of levels down 
the call-stack Massif descends needs to be increased. This is achieved using the --depth parameter. By 
default this is 3; increasing it to 5 will guarantee the call-stack reaches down to your code. One or two 
more levels may also be desirable to provide your code with some context. Since the level of detail becomes 
quickly overwhelming it is best to start with the smaller depth parameter and only increase it when it 
becomes apparent that it isn't sufficient.
+        </p>
+        <p>
+            It is also useful to tell <app>Massif</app> which functions allocate memory in glib. It removes 
an unnecessary layer of function calls from the reports and gives you a clearer idea of what code is 
allocating memory. The allocating functions in glib are g_malloc, g_malloc0, g_realloc, g_try_malloc, and 
g_mem_chunk_alloc. You use the --alloc-fn option to tell Massif about them.
+        </p>
+        <p>
+            Your command-line should therefore look something like:
+        </p>
+        <code>
+valgrind --tool=massif --depth=5  --alloc-fn=g_malloc --alloc-fn=g_realloc --alloc-fn=g_try_malloc \
+         --alloc-fn=g_malloc0 --alloc-fn=g_mem_chunk_alloc swell-foop
+        </code>
+        <p>
+            <app>Swell Foop</app> is the program we will be using as an example. Be warned that, since 
valgrind emulates the CPU, it will run <em>very</em> slowly. You will also need a lot of memory. </p>
+    </section>
+    <section id="optimization-massif-TBL-interpreting-results">
+        <title>Interpreting the Results</title>
+        <p>
+            The graphical output of <app>Massif</app> is largely self explanatory. Each band represents the 
memory allocated by one function over time. Once you identify which bands are using the most memory, usually 
the big thick ones at the top you will have to consult the text file for the details.
+        </p>
+        <p>
+            The text file is arranged as a hierarchy of sections, at the top is a list of the worst memory 
users arranged in order of decreasing spacetime. Below this are further sections, each breaking the results 
down into finer detail as you proceed down the call-stack. To illustrate this we will use the output of the 
command above.
+        </p>
+        <figure>
+            <title><app>Massif</app> output for the unoptimized version of the <app>Swell Foop</app> 
program.</title>
+            <media type="image" src="figures/massif-before.png"/>
+         </figure>
+        <p>
+            <link xref="optimization-massif-FIG-output-unoptimized" />
+            shows a typical postscript output from
+            <app>Massif</app>. This is the result you would get from
+            playing a single game of <app>Swell Foop</app> (version
+            2.8.0) and then quitting. The postscript file will have a
+            name like <file>massif.12345.ps</file> and the text file
+            will be called <file>massif.12345.txt</file>. The number
+            in the middle is the process ID of the program that was
+            examined. If you actually try this example you will find
+            two versions of each file, with slightly different
+            numbers, this is because <app>Swell Foop</app> starts a
+            second process and <app>Massif</app>
+            follows that too. We will ignore this second process, it
+            consumes very little memory.</p>
+        <p>
+            At the top of the graph we see a large yellow band labelled gdk_pixbuf_new. This seems like an 
ideal candidate for optimization, but we will need to use the text file to find out what is calling 
gdk_pixbuf_new. The top of the text file will look something like this:
+        </p>
+        <code>
+Command: ./swell-foop
+
+== 0 ===========================
+Heap allocation functions accounted for 90.4% of measured spacetime
+
+Called from:
+  28.8% : 0x6BF83A: gdk_pixbuf_new (in /usr/lib/libgdk_pixbuf-2.0.so.0.400.9)
+
+    6.1% : 0x5A32A5: g_strdup (in /usr/lib/libglib-2.0.so.0.400.6)
+
+    5.9% : 0x510B3C: (within /usr/lib/libfreetype.so.6.3.7)
+
+    3.5% : 0x2A4A6B: __gconv_open (in /lib/tls/libc-2.3.3.so)
+        </code>
+        <p>
+            The line with the '=' signs indicates how far down the stack trace we are, in this case we are 
at the top. After this it lists the heaviest users of memory in order of decreasing spacetime. Spacetime is 
the product of the amount of memory used and how long it was used for. It corresponds to the area of the 
bands in the graph. This part of the file tells us what we already know: most of the spacetime is dedicated 
to gdk_pixbuf_new. To find out what called gdk_pixbuf_new we need to search further down the text file:
+        </p>
+        <code>
+== 4 ===========================
+Context accounted for 28.8% of measured spacetime
+  0x6BF83A: gdk_pixbuf_new (in /usr/lib/libgdk_pixbuf-2.0.so.0.400.9)
+  0x3A998998: (within /usr/lib/gtk-2.0/2.4.0/loaders/libpixbufloader-png.so)
+  0x6C2760: (within /usr/lib/libgdk_pixbuf-2.0.so.0.400.9)
+  0x6C285E: gdk_pixbuf_new_from_file (in /usr/lib/libgdk_pixbuf-2.0.so.0.400.9)
+
+Called from:
+  27.8% : 0x804C1A3: load_scenario (swell-foop.c:463)
+
+    0.9% : 0x3E8095E: (within /usr/lib/libgnomeui-2.so.0.792.0)
+
+  and 1 other insignificant place
+        </code>
+        <p>
+            The first line tells us we are now four levels deep into the stack. Below it is a listing of the 
function calls that leads from here to gdk_pixbuf_new. Finally there is a list of functions that are at the 
next level down and call these functions. There are, of course, also entries for levels 1, 2, and 3, but this 
is the first level to reach right down through the GDK code to the <app>Swell Foop</app> code. From this 
listing, we can see instantly that the problem code is load_scenario.
+        </p>
+        <p>
+            Now that we know what part of our code is using all the spacetime we can look at it and find out 
why. It turns out that the load_scenario is loading a pixbuf from file and then never freeing that memory. 
Having identified the problem code, we can start to fix it.
+        </p>
+    </section>
+    <section id="optimization-massif-TBL-acting-on-results">
+        <title>Acting on the Results</title>
+        <p>
+            Reducing spacetime consumption is good, but there are two ways of reducing it and they are not 
equal. You can either reduce the amount of memory allocated, or reduce the amount of time it is allocated 
for. Consider for a moment a model system with only two processes running. Both processes use up almost all 
the physical RAM and if they overlap at all then the system will swap and everything will slow down. 
Obviously if we reduce the memory usage of each process by a factor of two then they can peacefully coexist 
without the need for swapping. If instead we reduce the time the memory is allocated by a factor of two then 
the two programs can coexist, but only as long as their periods of high memory use don't overlap. So it is 
better to reduce the amount of memory allocated.
+        </p>
+        <p>
+            Unfortunately, the choice of optimization is also
+           constrained by the needs of the program. The size of the
+           pixbuf data in <app>Swell Foop</app> is determined by the
+           size of the game's graphics and cannot be easily
+           reduced. However, the amount of time it spends loaded into
+           memory can be drastically reduced. <link xref="optimization-massif-FIG-output-optimized" /> shows 
the <app>Massif</app> analysis of <app>Swell Foop</app> after being altered to dispose of the pixbufs once 
the images have been loaded into the X server.
+        </p>
+        <figure>
+            <title><app>Massif</app> output for the optimized <app>Swell Foop</app> program.</title>
+           <media type="image" src="figures/massif-after.png" />
+            </figure>
+        <p>
+            The spacetime use of gdk_pixbuf_new is now a thin band that only spikes briefly (it is now the 
sixteenth band down and shaded magenta). As a bonus, the peak memory use has dropped by 200 kB since the 
spike occurs before other memory is allocated. If two processes like this were run together the chances of 
the peak memory usage coinciding, and hence the risk of swapping, would be quite low.
+        </p>
+        <p>
+            Can we do better ? A quick examination of <app>Massif</app>'s text output reveals: g_strdup to 
be the new major offender.
+        </p>
+        <code>
+Command: ./swell-foop
+
+== 0 ===========================
+Heap allocation functions accounted for 87.6% of measured spacetime
+
+Called from:
+    7.7% : 0x5A32A5: g_strdup (in /usr/lib/libglib-2.0.so.0.400.6)
+
+    7.6% : 0x43BC9F: (within /usr/lib/libgdk-x11-2.0.so.0.400.9)
+
+    6.9% : 0x510B3C: (within /usr/lib/libfreetype.so.6.3.7)
+
+    5.2% : 0x2A4A6B: __gconv_open (in /lib/tls/libc-2.3.3.so)
+        </code>
+        <p>
+            If we look closer though we see that it is called from many, many, places.
+        </p>
+        <code>
+== 1 ===========================
+Context accounted for  7.7% of measured spacetime
+  0x5A32A5: g_strdup (in /usr/lib/libglib-2.0.so.0.400.6)
+
+Called from:
+    1.8% : 0x8BF606: gtk_icon_source_copy (in /usr/lib/libgtk-x11-2.0.so.0.400.9)
+
+    1.1% : 0x67AF6B: g_param_spec_internal (in /usr/lib/libgobject-2.0.so.0.400.6)
+
+    0.9% : 0x91FCFC: (within /usr/lib/libgtk-x11-2.0.so.0.400.9)
+
+    0.8% : 0x57EEBF: g_quark_from_string (in /usr/lib/libglib-2.0.so.0.400.6)
+
+  and 155 other insignificant places
+        </code>
+        <p>
+            We now face diminishing returns for our optimization efforts. The graph hints at another 
possible approach: Both the "other" and "heap admin" bands are quite large. This tells us that there are a 
lot of small allocations being made from a variety of places. Eliminating these will be difficult, but if 
they can be grouped then the individual allocations can be larger and the "heap admin" overhead can be 
reduced.
+        </p>
+    </section>
+    <section id="optimization-massif-TBL-caveats">
+        <title>Caveats</title>
+        <p>
+            There are a couple of things to watch out for: Firstly, spacetime is only reported as a 
percentage, you have to compare it to the overall size of the program to decide if the amount of memory is 
worth pursuing. The graph, with its kilobyte vertical axis, is good for this.
+        </p>
+        <p>
+            Secondly, <app>Massif</app> only takes into account the memory used by your own program. 
Resources like pixmaps are stored in the X server and aren't considered by <app>Massif</app>. In the 
<app>Swell Foop</app> example we have actually only moved the memory consumption from client-side pixbufs to 
server-side pixmaps. Even though we cheated there are performance gains. Keeping the image data in the X 
server makes the graphics routines quicker and removes a lot of inter-process communication. Also, the 
pixmaps will be stored in a native graphics format which is often more compact than the 32-bit RGBA format 
used by gdk_pixbuf. To measure the effect of pixmaps, and other X resources use the <link 
href="http://www.freedesktop.org/Software/xrestop";>xrestop</link> program.
+        </p>
+    </section>
+</page>


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]