[sysprof/ftrace] Update TODO

From: Søren Sandmann Pedersen <ssp src gnome org>
To: svn-commits-list gnome org
Cc:
Subject: [sysprof/ftrace] Update TODO
Date: Sat, 15 Aug 2009 09:01:45 +0000 (UTC)
commit 199fcd2471ecd7c827959972d7c9954eb771505e
Author: SÃ¸ren Sandmann Pedersen <sandmann daimi au dk>
Date:   Sat Aug 15 04:30:21 2009 -0400

    Update TODO

 TODO        |  660 +++++++++++++++++++++++++++++++----------------------------
 collector.c |    2 +-
 sysprof.c   |    2 +-
 3 files changed, 344 insertions(+), 320 deletions(-)
---
diff --git a/TODO b/TODO
index b4cde8d..41df1bf 100644
--- a/TODO
+++ b/TODO
@@ -23,16 +23,31 @@ Before 1.0.4:
 
 Before 1.2:
 
-* Give an informative error message if run as root
+* With a high sample rate, sysprof's label updating gets profiled which 
+  then causes the label to change, causing a feedback
+  loop. Possibilities include:
 
-* Find out why gtk_tree_view_columns_autosize() apparently doesn't
-  work on empty tree views.
+	- Use lower sample rate
+	- Update the label less frequently
+	- Ignore samples generated while updating label
 
-* Get rid of O(n^2) string handling in on_read()
+* Possibly look at GtkTreeView issues
 
-* Decide whether to ship the kernel module as an option. Pros: hedge
-  against ftrace breaking, some people run old kernels, old
-  RHEL; con: more code to maintain.
+	- gtk_tree_view_columns_autosize() apparently doesn't
+	  work on empty tree views.
+
+	- Find out how to hack around gtk+ bug causing multiple double
+	  clicks to get eaten.
+
+	- Switching between descendant views is a slow:
+		  - gtk_tree_store_get_path() is O(n^2) and accounts
+		    for 43% of the time.
+		  - GObject signal emission overhead accounts for 18%
+		    of the time. 
+	  Consider adding a forked version of GtkTreeStore with
+	  performance and bug fixes.
+
+* Give an informative error message if run as root
 
 * Hack to disable recursion for binaries without symbols causes the
   symbols to not work the way other symbols do.  A better approach is
@@ -69,36 +84,18 @@ Before 1.2:
 
 * Is the move-to-front in process_locate_map() really worth it?
 
-* Whenever we fail to lock the atomic variable, track this, and send the
-  information to userspace as an indication of the overhead of the profiling.
-  Although there is inherent aliasing here since stack scanning happens at
-  regular intervals.
+* Decide whether to ship the kernel module as an option. Pros: hedge
+  against ftrace breaking, some people run old kernels, old
+  RHEL; Cons: more code to maintain.
 
 * Apparently, if you upgrade the kernel, then don't re-run configure, 
   the kernel Makefile will delete all of /lib/modules/<release>/kernel 
   if you run make install in the module directory. Need to find out what
   is going on.
 
-* Performance:
-	Switching between descendant views is a slow:
-		  - gtk_tree_store_get_path() is O(n^2) and accounts
-		    for 43% of the time.
-		  - GObject signal emission overhead accounts for 18% of
-		    the time. 
-	Consider adding a forked version of GtkTreeStore with
-	performance and bug fixes.
-
-* Make sure that labels look decent in case of "No Map" etc.
-
 * If we end up believing the kernel's own stacktraces, maybe
   /proc/kallsyms shouldn't be parsed until the user hits profile.
 
-* Elf bugs:
-	- error handling for bin_parser is necessary.
-
-	* Find out why all apps have an "In file /usr/bin/<app binary>" below
-	  _libc_main. If possible, maybe make up a name for it.
-
 * vdso stuff:
 	- the "[vdso]" string should be #defined somewhere
 	- Does get_vdso_bytes() belong in process.c?
@@ -122,83 +119,6 @@ Before 1.2:
 
 * Convert things like [heap] and [stack] to more understandable labels.
 
-* Strategies for taking reliable stacktraces.
-
-	Three different kinds of files
-
-	- kernel
-	- vdso
-	- regular elf files
-
-	- kernel
-		- eh_frame annotations, in kernel or in kernel debug
-		- /proc/kallsyms
-		- userspace can look at _stext and _etext to determine 
-		  start and end of kernel text segment
-		- copying kernel stack to userspace
-			- it's always 4096 bytes these days
-		- heuristically determine functions based on address
-			- callbacks on the stack can be identified
-			  by having an offset of 0.
-			- even so there is a lot of false positives.
-		- is eh_frame usually loaded into memory during normal
-		  operation? It is mapped, but probably not paged in,
-		  so we will be taking a few major page faults when we
-		  first profile something.
-			Unless of course, we store the entire stack in
-		  the stackstash. This may use way too much memory though.
-
-		- Locking, possibly useful code:
-
-		/* In principle we should use get_task_mm() but
-		 * that will use task_lock() leading to deadlock
-		 * if somebody already has the lock
-		 */
-		if (spin_is_locked (&current->alloc_lock))
-			printk ("alreadylocked\n");
-		{
-			struct mm_struct *mm = current->mm;
-			if (mm)
-			{
-				printk (KERN_ALERT "stack size: %d (%d)\n",
-					mm->start_stack - regs->REG_STACK_PTR,
-					current->pid);
-				
-				stacksize = mm->start_stack - regs->REG_STACK_PTR;
-			}
-			else
-				stacksize = 1;
-		}
-
-	- regular elf
-		- usually have eh_frame section which is mapped into memory
-		  during normal operation
-		- do stackwalk in kernel based on eh_frame
-		- eh_frame section is usually mapped into memory, so
-		  no file reading in kernel would be necessary.
-		- do stackwalk in userland based on eh_frame
-		- do ebp based stackwalk in kernel
-		- do ebp based stackwalk in userland
-		- do heuristic stackwalk in kernel
-		- do heuristic stackwalk in userland
-
-	- Send heuristic stack trace to user space, along with
-	  location on the stack. Then, in userspace analyze the
-	  machine code to determine the size of the stack frame at any
-	  point. The instructions that would need to be recognized are:
-
-	  	 subl <constant>, %esp
-		 addl <constant>, %esp
-		 leave
-		 jcc
-		 push
-		 pop
-
-	  GCC is unlikely to have different stack sizes at the entry
-	  to a basic block.
-
-	  We can often find a vmlinux in /lib/modules/<uname-r>/build.
-
 * "Expand all" is horrendously slow because update_screenshot gets called 
   for every "expanded" signal. In fact even normal expanding is really
   slow. It's probably hopeless to get decent performance out of GtkTreeView,
@@ -212,6 +132,11 @@ Before 1.2:
 
 * Missing things in binparser.[ch]
 
+	- error handling for bin_parser is necessary.
+
+	* Find out why all apps have an "In file /usr/bin/<app binary>" below
+	  _libc_main. If possible, maybe make up a name for it.
+
 	- it's inconvenient that you have to pass in both a parser _and_
 	  a record. The record should just contain a pointer to the parser.
 	  On the other hand, the result does depend on the parser->offset.
@@ -225,17 +150,11 @@ Before 1.2:
 	- "native endian" is probably not useful. Maybe go back to just
 	  having big/little endian.
 
-  Should probably rethink the whole thing. It's just not very convenient to use, even
-  for simple things like ELF files.
-
-* Rename stack_stash_foreach_by_address() to stack_stash_foreach_unique(),
-  or maybe not ...
+  Should probably rethink the whole thing. It's just not very
+  convenient to use, even for simple things like ELF files.
 
 * Make it compilable against a non-running kernel.
 
-* Maybe report idle time? Although this would come for free with the
-  timelines.
-
 * Fix (deleted) problem. But more generally, whenever we can't display a
   symbol, display an error message instead, ie., 
 	- 'Binary file <xxxx> was deleted/replaced'
@@ -251,8 +170,6 @@ Before 1.2:
 
 * Add spew infrastructure to make remote debugging easier.
 
-* Make it compile and work on x86-64
-
 * With kernel module not installed, select Profiler->Start, then dismiss
   the alert. This causes the start button to appear prelighted. Probably
   just another gtk+ bug.
@@ -285,6 +202,57 @@ Before 1.2:
 			- maybe simply make stackstashes able to 
 			  save themselves.
 
+* See if the auto-expanding can be made more intelligent
+	- "Everything" should be expanded exactly one level
+	- all trees should be expanded at least one level
+
+- See if there is a way to make it distcheck
+
+- grep "FIXME - not10"
+
+- Ability to generate "screenshots" suitable for mail/blog/etc
+	UI: "generate screenshot" menu item pops up a window with
+	a text area + a radio buttons "text/html". When you flick
+	them, the text area is automatically updated.
+	- beginning in CVS:
+		- why does the window not remember its position when
+		  you close it with the close button, but does remember
+		  it when you use the wm button or the menu item? It actually
+		  seems that it only forgets the position when you click the
+		  button with the mouse. But not if you use the keyboard ...
+			This is a gtk+ bug.
+
+- Make busy cursors more intelligent
+	- when you click something in the main list and we don't respond
+		within 50ms (or perhaps when we expect to not be able to do
+		so (can we know the size in advance?))
+	- instead of what we do now: set the busy cursor unconditionally
+	
+- Add view->ancestors/descendants menu items
+
+- rethink caller list, not terribly useful at the moment. Federico suggested
+  listing all ancestors.
+	Done: implemented this idea in CVS HEAD. If we keep it that way,
+	should do a globale s/callers/ancestors on the code.
+  	- not sure it's an improvement. Often it is more interesting to
+	find the immediate callers.
+	- Now it's back to just listing the immediate callers.
+
+- hide internal stuff in ProfileDescendant
+
+
+==== Later =====
+
+* Whenever we fail to lock the atomic variable, track this, and send the
+  information to userspace as an indication of the overhead of the profiling.
+  Although there is inherent aliasing here since stack scanning happens at
+  regular intervals.
+
+* Maybe report idle time? Although this would come for free with the
+  timelines.
+
+* Make it compile and work on x86-64
+
 - rethink loading and saving. Goals
 
 	- Can load 1.0 profiles
@@ -327,8 +295,8 @@ Before 1.2:
 	- make a generic representation of xml files with quarks for strings:
 		struct item { 
 			int begin/end/text;
-			quark text:		-> begin/end/value
-			int id;			-> for begins  that are pointed to
+			quark text:	-> begin/end/value
+			int id;		-> for begins  that are pointed to
 		}
 	  perhaps even with iterators. Should be compact and suitable for both
 	  input and output. As a first cut, perhaps just split out the 
@@ -351,93 +319,6 @@ Before 1.2:
 		
 
 	
-* See if the auto-expanding can be made more intelligent
-	- "Everything" should be expanded exactly one level
-	- all trees should be expanded at least one level
-
-* Send entire stack to user space, then do stackwalking there. That would
-  allow us to do more complex algorithms, like dwarf, in userspace. Though
-  we'd lose the ability to do non-racy file naming. We could pass a list
-  of the process mappings with each stack though. Doing this would also solve
-  the problem of not being able to get maps of processes running as root.
-	Might be too expensive though. User stacks seem to be on the order
-  of 100K usually, which for 200 times a second means a bandwidth of
-  20MB/s, which is probably too much. One question is how much of it
-  usually changes. 
-	Actually it seems that the _interesting_ part of the stack 
-  (ie., from the stack pointer and up) is not that big in many cases. The
-  average stacksize seemed to be about 7700 bytes for gcc compiling gtk+.
-  Even deeply recursive apps like sysprof only generate about 55K stacks.
-
-  Other possibilities:
-
-	- Do heuristic stack walking where it lists all words on the stack
-	  that look like they might be return addresses.
-
-	- Somehow map the application's stack pages into the client. This
-	  is likely difficult or impossible.
-
-	- Another idea: copy all addresses that look like they could be
-	  return addresses, along with the location on the stack. This
-	  just might be enough for a userspace stack walker.
-
-	- Yet another: krh suggests hashing blocks of the stack, then
-	  only sending the blocks that changed since last time.
-
-		- every time you send a stackblock, also send a cookie.
-
-		- whenever you *don't* send a stackblock, send the cookie
-		  instead. That way you always get a complete stacktrace
-		  conceptually.
-
-		- also, that would allow the kernel to just have a simple
-		  hashtable containing the known blocks. Though, that could
-		  become large. Actually there is no reason to store the
-		  blocks; you can just send the hashcode. That way you 
-		  would only need to store a list of hashcodes that we
-		  have generated previously.
-
-	- One problem with doing DWARF walking is that the debug code
-	  will have to be faulted in. This can be a substantial amount
-	  of disk access which is undesirable to have during a
-	  profiling run. Even if we only have to fault in the
-	  .eh_frame_hdr section, that's still 18 pages for gtk+. The 
-	  .eh_frame section for gtk+ is 72 pages.
-
-	  A possibility may be to consider two stacktraces identical
-	  if the only differing values are *outside* the text
-	  segments.  This may work since stack frames tend to be the
-	  same size.  Is there a way of determining the location of
-	  text segments without reading the ELF files? Maybe just
-	  check if it's inside an executable mappign.
-
-	  It is then sufficient in user space to only store one
-	  representative for each set of considered-identical stack
-	  traces.
-
-	  User space storage: Use the stackstash tree. When a new trace
-	  is added, just skip over nodes that differ, but where none of
-	  them points to text segments. Two possibilities then:
-
-	        - when two traces are determined to differ, store them
-	          in completely separate trees. This ensures that we
-	          will never run the dwarf algorithm on an invalid
-	          stack trace, but also means that we won't get shared
-	          prefixes for stacktraces.
-
-		- when two traces are determined to differ, branch off
-		  as currently. This will share more data, but the
-		  dwarf algorithm could be run on invalid traces. It
-		  may work in practice though if the compiler
-		  generally uses fixed stack frames.
-
-		  A twist on is to mark the complete stack traces as
-		  "complete". Then after running the DWARF algorithm,
-		  the generated stack trace can be saved with it. This
-		  way incomplete stack traces branching off a complete
-		  one can be completed using the DWARF information for
-		  the shared part.
-
 * Notes on heuristic stack walking
 
   - We can reject addresses that point exactly to the beginning of a
@@ -521,33 +402,6 @@ Before 1.2:
 	  This means the datastructure will probably have to be done a
 	  little differently.
 
-- See if there is a way to make it distcheck
-
-- grep "FIXME - not10"
-
-- translation should be hooked up 
-
-- Consider adding "at least 5% inclusive cost" filter
-
-- consider having the ability to group a function together with its nearest
-  neighbours. That way we can eliminate some of the effect of 
-	"one function taking 10% of the time"
-  vs.
-	"the same function broken into ten functions each taking 1%"
-  Not clear what the UI looks like though.
-
-- Ability to generate "screenshots" suitable for mail/blog/etc
-	UI: "generate screenshot" menu item pops up a window with
-	a text area + a radio buttons "text/html". When you flick
-	them, the text area is automatically updated.
-	- beginning in CVS:
-		- why does the window not remember its position when
-		  you close it with the close button, but does remember
-		  it when you use the wm button or the menu item? It actually
-		  seems that it only forgets the position when you click the
-		  button with the mouse. But not if you use the keyboard ...
-			This is a gtk+ bug.
-
 - Find out how gdb does backtraces; they may have a better way. Also
   find out what dwarf2 is and how to use it. Look into libunwind.
   It seems gdb is capable of doing backtraces of code that neither has
@@ -562,12 +416,19 @@ http://www.linuxbase.org/spec/booksets/LSB-Embedded/LSB-Embedded/ehframe.html
 	http://cutebugs.net/bozo-profiler/
   which has an elf32 parser/debugger
 
-- Make busy cursors more intelligent
-	- when you click something in the main list and we don't respond
-		within 50ms (or perhaps when we expect to not be able to do
-		so (can we know the size in advance?))
-	- instead of what we do now: set the busy cursor unconditionally
-	
+  There is also libunwind, which seems potentially useful.
+
+- translation should be hooked up 
+
+- Consider adding "at least 5% inclusive cost" filter
+
+- consider having the ability to group a function together with its nearest
+  neighbours. That way we can eliminate some of the effect of 
+	"one function taking 10% of the time"
+  vs.
+	"the same function broken into ten functions each taking 1%"
+  Not clear what the UI looks like though.
+
 - Consider adding ability to show more than one function at a time. Algorithm:
 	Find all relevant nodes;
 	For each relevant node
@@ -593,16 +454,6 @@ http://www.linuxbase.org/spec/booksets/LSB-Embedded/LSB-Embedded/ehframe.html
 	  together with a tree. (Could add radio buttons somewhere in
 	  in the right pane).
 
-- Add view->ancestors/descendants menu items
-
-- rethink caller list, not terribly useful at the moment. Federico suggested
-  listing all ancestors.
-	Done: implemented this idea in CVS HEAD. If we keep it that way,
-	should do a globale s/callers/ancestors on the code.
-  	- not sure it's an improvement. Often it is more interesting to
-	find the immediate callers.
-	- Now it's back to just listing the immediate callers.
-
 - Have kernel module report the file the address was found in
 	Should avoid a lot of potential broken/raciness with dlopen etc.
 	Probably better to send a list of maps with each trace. Which 
@@ -630,15 +481,6 @@ http://www.linuxbase.org/spec/booksets/LSB-Embedded/LSB-Embedded/ehframe.html
 			char filenames [2048];
 		}
 
-- Figure out how Google's pprof script works. Then add real call graph 
-  drawing. (google's script is really simple; uses dot from graphviz).
-  KCacheGrind also uses dot to do graph drawing.
-
-- hide internal stuff in ProfileDescendant
-
-- possibly add dependency on glib 2.8 if it is released at that point.
-  (g_file_replace())
-
 * Some notes about timer interrupt handling in Linux
 
 On an SMP system APIC is used - the interesting file is arch/i386/kernel/apic.c
@@ -660,7 +502,172 @@ When the interrupt happens,
 	from kernel mode to kernel mode, it does _not_ push SS/ESP.
 	It does in both cases push EIP though.
 
-Later:
+* Strategies for taking reliable stacktraces.
+
+	Three different kinds of files
+
+	- kernel
+	- vdso
+	- regular elf files
+
+	- kernel
+		- eh_frame annotations, in kernel or in kernel debug
+		- /proc/kallsyms
+		- userspace can look at _stext and _etext to determine 
+		  start and end of kernel text segment
+		- copying kernel stack to userspace
+			- it's always 4096 bytes these days
+		- heuristically determine functions based on address
+			- callbacks on the stack can be identified
+			  by having an offset of 0.
+			- even so there is a lot of false positives.
+			- return addresses always point to something
+			  immediately after a call instruction.
+		- is eh_frame usually loaded into memory during normal
+		  operation? It is mapped, but probably not paged in,
+		  so we will be taking a few major page faults when we
+		  first profile something.
+			Unless of course, we store the entire stack in
+		  the stackstash. This may use way too much memory though.
+
+		- Locking, possibly useful code:
+
+		/* In principle we should use get_task_mm() but
+		 * that will use task_lock() leading to deadlock
+		 * if somebody already has the lock
+		 */
+		if (spin_is_locked (&current->alloc_lock))
+			printk ("alreadylocked\n");
+		{
+			struct mm_struct *mm = current->mm;
+			if (mm)
+			{
+				printk (KERN_ALERT "stack size: %d (%d)\n",
+					mm->start_stack - regs->REG_STACK_PTR,
+					current->pid);
+				
+				stacksize = 
+				     mm->start_stack - regs->REG_STACK_PTR;
+			}
+			else
+				stacksize = 1;
+		}
+
+	- regular elf
+		- usually have eh_frame section which is mapped into memory
+		  during normal operation
+		- do stackwalk in kernel based on eh_frame
+		- eh_frame section is usually mapped into memory, so
+		  no file reading in kernel would be necessary.
+		- do stackwalk in userland based on eh_frame
+		- do ebp based stackwalk in kernel
+		- do ebp based stackwalk in userland
+		- do heuristic stackwalk in kernel
+		- do heuristic stackwalk in userland
+
+	- Send heuristic stack trace to user space, along with
+	  location on the stack. Then, in userspace analyze the
+	  machine code to determine the size of the stack frame at any
+	  point. The instructions that would need to be recognized are:
+
+	  	 subl <constant>, %esp
+		 addl <constant>, %esp
+		 leave
+		 jcc
+		 push
+		 pop
+
+	  GCC is unlikely to have different stack sizes at the entry
+	  to a basic block.
+
+	  We can often find a vmlinux in /lib/modules/<uname-r>/build.
+
+    * Send entire stack to user space, then do stackwalking
+      there. That would allow us to do more complex algorithms, like
+      dwarf, in userspace. Though we'd lose the ability to do non-racy
+      file naming. We could pass a list of the process mappings with
+      each stack though. Doing this would also solve the problem of
+      not being able to get maps of processes running as root.
+
+      Might be too expensive though. User stacks seem to be on the
+      order of 100K usually, which for 200 times a second means a
+      bandwidth of 20MB/s, which is probably too much. One question is
+      how much of it usually changes.
+
+      Actually it seems that the _interesting_ part of the stack (ie.,
+      from the stack pointer and up) is not that big in many
+      cases. The average stacksize seemed to be about 7700 bytes for
+      gcc compiling gtk+.  Even deeply recursive apps like sysprof
+      only generate about 55K stacks.
+
+      Other possibilities:
+
+	- Do heuristic stack walking where it lists all words on the stack
+	  that look like they might be return addresses.
+
+	- Somehow map the application's stack pages into the client. This
+	  is likely difficult or impossible.
+
+	- Another idea: copy all addresses that look like they could be
+	  return addresses, along with the location on the stack. This
+	  just might be enough for a userspace stack walker.
+
+	- Yet another: krh suggests hashing blocks of the stack, then
+	  only sending the blocks that changed since last time.
+
+		- every time you send a stackblock, also send a cookie.
+
+		- whenever you *don't* send a stackblock, send the cookie
+		  instead. That way you always get a complete stacktrace
+		  conceptually.
+
+		- also, that would allow the kernel to just have a simple
+		  hashtable containing the known blocks. Though, that could
+		  become large. Actually there is no reason to store the
+		  blocks; you can just send the hashcode. That way you 
+		  would only need to store a list of hashcodes that we
+		  have generated previously.
+
+	- One problem with doing DWARF walking is that the debug code
+	  will have to be faulted in. This can be a substantial amount
+	  of disk access which is undesirable to have during a
+	  profiling run. Even if we only have to fault in the
+	  .eh_frame_hdr section, that's still 18 pages for gtk+. The 
+	  .eh_frame section for gtk+ is 72 pages.
+
+	  A possibility may be to consider two stacktraces identical
+	  if the only differing values are *outside* the text
+	  segments.  This may work since stack frames tend to be the
+	  same size.  Is there a way of determining the location of
+	  text segments without reading the ELF files? Maybe just
+	  check if it's inside an executable mappign.
+
+	  It is then sufficient in user space to only store one
+	  representative for each set of considered-identical stack
+	  traces.
+
+	  User space storage: Use the stackstash tree. When a new trace
+	  is added, just skip over nodes that differ, but where none of
+	  them points to text segments. Two possibilities then:
+
+	        - when two traces are determined to differ, store them
+	          in completely separate trees. This ensures that we
+	          will never run the dwarf algorithm on an invalid
+	          stack trace, but also means that we won't get shared
+	          prefixes for stacktraces.
+
+		- when two traces are determined to differ, branch off
+		  as currently. This will share more data, but the
+		  dwarf algorithm could be run on invalid traces. It
+		  may work in practice though if the compiler
+		  generally uses fixed stack frames.
+
+		  A twist on is to mark the complete stack traces as
+		  "complete". Then after running the DWARF algorithm,
+		  the generated stack trace can be saved with it. This
+		  way incomplete stack traces branching off a complete
+		  one can be completed using the DWARF information for
+		  the shared part.
 
 - If the stack trace ends in a memory access instruction, send the
   vma information to userspace. Then have user space
@@ -696,13 +703,11 @@ Later:
 - Applications should be able to say "start profiling", "stop profiling"
   so that you can limit the profiling to specific areas.
 	Idea:
-		Add a new kernel interface that applications uses to say
-		begin/end.
-		Then add a timeline where you can mark interesting regions,
-		for example those that applications have marked interesting.
 
-- Find out how to hack around gtk+ bug causing multiple double clicks 
-  to get eaten.
+		Add a new kernel interface that applications use to
+		say begin/end. Then add a timeline where you can mark
+		interesting regions, for example those that
+		applications have marked interesting.
 
 - Consider what it would take to take stacktraces of other languages such
   as perl, python, java, ruby, or bash. Or scheme.
@@ -727,11 +732,11 @@ Later:
   records.
 
 - Consider this usecase:
-	Someone is considering replacing malloc()/free() with a freelist
-	for a certain data structure. All use of this data structure is 
-	confined to one function, foo(). It is now interesting to know
-	how much time that particular function spends on malloc() and free()
-	combined.
+	Someone is considering replacing malloc()/free() with a
+	freelist for a certain data structure. All use of this data
+	structure is confined to one function, foo(). It is now
+	interesting to know how much time that particular function
+	spends on malloc() and free() combined.
 
 	Possible UI:
 
@@ -792,80 +797,99 @@ Later:
 
    For Memory:  badness=<cache line size not in cache>,		      cookie=<the address>
 
-   Cookies are used to figure out whether an access is really the same, ie., for two identical
-   cookies, the size is still just one, however 
+   Cookies are used to figure out whether an access is really the
+   same, ie., for two identical cookies, the size is still just one,
+   however
    
-   Memory is different from disk because you can't reasonably assume that stuff that has
-   been read will stay in cache (for short profile runs you can assume that with disk,
-   but not for long ones).
+   Memory is different from disk because you can't reasonably assume
+   that stuff that has been read will stay in cache (for short profile
+   runs you can assume that with disk, but not for long ones).
 
-   - Perhaps show a timeline with CPU in one color and disk in one color. Allow people to
-     look at at subintervals of this timeline. Is it useful to look at both CPU and disk at 
-     the same time? Probably not. See also marker discussion above. UI should probably allow
-     double clicking on a marked section and all instances of that one would be marked.
+   - Perhaps show a timeline with CPU in one color and disk in one
+     color. Allow people to look at at subintervals of this
+     timeline. Is it useful to look at both CPU and disk at the same
+     time? Probably not. See also marker discussion above. UI should
+     probably allow double clicking on a marked section and all
+     instances of that one would be marked.
 
-   - Other variation on the timeline idea: Instead of a disk timeline you could have a 
-     list of individual diskaccesses, and be able to select the ones you wanted to
-     get rid of.
+   - Other variation on the timeline idea: Instead of a disk timeline
+     you could have a list of individual diskaccesses, and be able to
+     select the ones you wanted to get rid of.
 
-   - The existing sysprof visualization is not terribly bad, the "self" column is
-     more useful now. 
+   - The existing sysprof visualization is not terribly bad, the
+     "self" column is more useful now.
 
-   - See what files are accessed so that you can get a getter idea of what
-     the system is doing. 
+   - See what files are accessed so that you can get a getter idea of
+     what the system is doing.
 
    - Optimization usecases:
 
-	- A lot of stuff is read synchronously, but it is possible to read
-	  it asynchronously.
-	  Visualization: A timeline with alternating CPU/disk activity. 
+	- A lot of stuff is read synchronously, but it is possible to
+	  read it asynchronously.  Visualization: A timeline with
+	  alternating CPU/disk activity.
 
 	- What function is doing all the synchronous reading, and what
-	  files/offsets is it reading. Visualization: lots of reads across
-	  different files out of one function
+	  files/offsets is it reading. Visualization: lots of reads
+	  across different files out of one function
 
 	- A piece of the program is doing disk I/O. We can drop that
- 	  entire piece of code. Sysprof visualization is ok, although seeing
-	  the files accessed is useful so that we can tell if those files are
-	  not just going to be used in other places. (Gnumeric plugin_init()).
-
-	- A function is reading a file synchronously, but there is other
-	  (CPU/disk) stuff that could be done at the same time. Visualization:
-	  A piece of the timeline is diskbound with little or no CPU used.
-
-	- Want to improve code locality of library or binary. Visualization:
-	  no GUI, just produce a list of functions that should be put first in
-	  the file. Then run the program again until the list converges.
-	  (Valgrind may be more useful here).
+ 	  entire piece of code. Sysprof visualization is ok, although
+ 	  seeing the files accessed is useful so that we can tell if
+ 	  those files are not just going to be used in other
+ 	  places. (Gnumeric plugin_init()).
+
+	- A function is reading a file synchronously, but there is
+	  other (CPU/disk) stuff that could be done at the same
+	  time. Visualization: A piece of the timeline is diskbound
+	  with little or no CPU used.
+
+	- Want to improve code locality of library or
+	  binary. Visualization: no GUI, just produce a list of
+	  functions that should be put first in the file. Then run the
+	  program again until the list converges.  (Valgrind may be
+	  more useful here).
 
 	- Nautilus reads a ton of files, icons + all the files in the
-	  homedirectory. Normal sysprof visualization is probably useful
-	  enough.
+	  homedirectory. Normal sysprof visualization is probably
+	  useful enough.
 
-	- Profiling a login session. 
+	- Profiling a login session.
 
-	- Many applications are running at the same time, doing IPC. It would
-	  be useful if we could figure out what other things a given process
-	  is waiting on. Eg., in poll, find out what processes have the other
-	  ends of the fd's open.
-		Visualization: multiple lines on a graph. Lines join up where
-	  one process is blocking on another. That would show processes holding
-	  up the progress very clearly.
-	  This was suggested by Federico.
+	- Many applications are running at the same time, doing
+	  IPC. It would be useful if we could figure out what other
+	  things a given process is waiting on. Eg., in poll, find out
+	  what processes have the other ends of the fd's open.
+	  Visualization: multiple lines on a graph. Lines join up
+	  where one process is blocking on another. That would show
+	  processes holding up the progress very clearly.  This was
+	  suggested by Federico.
 
-    - Need to report stat() as well. (Where do inode data end up? In the
-      buffer-cache?) Also open() may cause disk reads (seeks).
+    - Need to report stat() as well. (Where do inode data end up? In
+      the buffer-cache?) Also open() may cause disk reads (seeks).
 
     - To generate the timeline we need to know when a disk request is
-      issued and when it is completed. This way we can assign blame to all
-      applications that have issued a disk request at a given point in time. 
+      issued and when it is completed. This way we can assign blame to
+      all applications that have issued a disk request at a given
+      point in time.
 
-      The disk timeline should probably vary in intensity with the number
-      of outstanding disk requests.
+      The disk timeline should probably vary in intensity with the
+      number of outstanding disk requests.
 
 
 -=-=-=-=-=-=-=-=-=-=-=-=-=-=- ALREADY DONE: -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 
+* Get rid of O(n^2) string handling in on_read(). It wasn't O(n^2)
+
+* Rename stack_stash_foreach_by_address() to stack_stash_foreach_unique(),
+  or maybe not ... No.
+
+- Figure out how Google's pprof script works. Then add real call graph 
+  drawing. (google's script is really simple; uses dot from graphviz).
+  KCacheGrind also uses dot to do graph drawing.
+
+* possibly add dependency on glib 2.8 if it is released at that point.
+  (g_file_replace())
+
 * Rename sysprof-text to sysprof-cli
 
 * Find out why the samples label won't right adjust
diff --git a/collector.c b/collector.c
index b3bd984..0c3fcd8 100644
--- a/collector.c
+++ b/collector.c
@@ -420,7 +420,7 @@ start_tracing (Collector  *collector,
 	{ SYSPROF_DIR "current_tracer",        "sysprof" },
 	{ SYSPROF_DIR "trace_options",         "raw" },
 	{ SYSPROF_DIR "trace_options",         "bin" },
-	{ SYSPROF_DIR "sysprof_sample_period", "2000" },
+	{ SYSPROF_DIR "sysprof_sample_period", "5000" },
     };
     
     int fd;
diff --git a/sysprof.c b/sysprof.c
index 38e4c3c..82c0dec 100644
--- a/sysprof.c
+++ b/sysprof.c
@@ -176,7 +176,7 @@ static void
 queue_show_samples (Application *app)
 {
     if (!app->timeout_id)
-	app->timeout_id = g_timeout_add (225, show_samples_timeout, app);
+	app->timeout_id = g_timeout_add (500, show_samples_timeout, app);
 }
 
 static void
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]