[sysprof/ftrace] Update TODO
- From: Søren Sandmann Pedersen <ssp src gnome org>
- To: svn-commits-list gnome org
- Cc:
- Subject: [sysprof/ftrace] Update TODO
- Date: Sat, 15 Aug 2009 09:01:45 +0000 (UTC)
commit 199fcd2471ecd7c827959972d7c9954eb771505e
Author: Søren Sandmann Pedersen <sandmann daimi au dk>
Date: Sat Aug 15 04:30:21 2009 -0400
Update TODO
TODO | 660 +++++++++++++++++++++++++++++++----------------------------
collector.c | 2 +-
sysprof.c | 2 +-
3 files changed, 344 insertions(+), 320 deletions(-)
---
diff --git a/TODO b/TODO
index b4cde8d..41df1bf 100644
--- a/TODO
+++ b/TODO
@@ -23,16 +23,31 @@ Before 1.0.4:
Before 1.2:
-* Give an informative error message if run as root
+* With a high sample rate, sysprof's label updating gets profiled which
+ then causes the label to change, causing a feedback
+ loop. Possibilities include:
-* Find out why gtk_tree_view_columns_autosize() apparently doesn't
- work on empty tree views.
+ - Use lower sample rate
+ - Update the label less frequently
+ - Ignore samples generated while updating label
-* Get rid of O(n^2) string handling in on_read()
+* Possibly look at GtkTreeView issues
-* Decide whether to ship the kernel module as an option. Pros: hedge
- against ftrace breaking, some people run old kernels, old
- RHEL; con: more code to maintain.
+ - gtk_tree_view_columns_autosize() apparently doesn't
+ work on empty tree views.
+
+ - Find out how to hack around gtk+ bug causing multiple double
+ clicks to get eaten.
+
+ - Switching between descendant views is a slow:
+ - gtk_tree_store_get_path() is O(n^2) and accounts
+ for 43% of the time.
+ - GObject signal emission overhead accounts for 18%
+ of the time.
+ Consider adding a forked version of GtkTreeStore with
+ performance and bug fixes.
+
+* Give an informative error message if run as root
* Hack to disable recursion for binaries without symbols causes the
symbols to not work the way other symbols do. A better approach is
@@ -69,36 +84,18 @@ Before 1.2:
* Is the move-to-front in process_locate_map() really worth it?
-* Whenever we fail to lock the atomic variable, track this, and send the
- information to userspace as an indication of the overhead of the profiling.
- Although there is inherent aliasing here since stack scanning happens at
- regular intervals.
+* Decide whether to ship the kernel module as an option. Pros: hedge
+ against ftrace breaking, some people run old kernels, old
+ RHEL; Cons: more code to maintain.
* Apparently, if you upgrade the kernel, then don't re-run configure,
the kernel Makefile will delete all of /lib/modules/<release>/kernel
if you run make install in the module directory. Need to find out what
is going on.
-* Performance:
- Switching between descendant views is a slow:
- - gtk_tree_store_get_path() is O(n^2) and accounts
- for 43% of the time.
- - GObject signal emission overhead accounts for 18% of
- the time.
- Consider adding a forked version of GtkTreeStore with
- performance and bug fixes.
-
-* Make sure that labels look decent in case of "No Map" etc.
-
* If we end up believing the kernel's own stacktraces, maybe
/proc/kallsyms shouldn't be parsed until the user hits profile.
-* Elf bugs:
- - error handling for bin_parser is necessary.
-
- * Find out why all apps have an "In file /usr/bin/<app binary>" below
- _libc_main. If possible, maybe make up a name for it.
-
* vdso stuff:
- the "[vdso]" string should be #defined somewhere
- Does get_vdso_bytes() belong in process.c?
@@ -122,83 +119,6 @@ Before 1.2:
* Convert things like [heap] and [stack] to more understandable labels.
-* Strategies for taking reliable stacktraces.
-
- Three different kinds of files
-
- - kernel
- - vdso
- - regular elf files
-
- - kernel
- - eh_frame annotations, in kernel or in kernel debug
- - /proc/kallsyms
- - userspace can look at _stext and _etext to determine
- start and end of kernel text segment
- - copying kernel stack to userspace
- - it's always 4096 bytes these days
- - heuristically determine functions based on address
- - callbacks on the stack can be identified
- by having an offset of 0.
- - even so there is a lot of false positives.
- - is eh_frame usually loaded into memory during normal
- operation? It is mapped, but probably not paged in,
- so we will be taking a few major page faults when we
- first profile something.
- Unless of course, we store the entire stack in
- the stackstash. This may use way too much memory though.
-
- - Locking, possibly useful code:
-
- /* In principle we should use get_task_mm() but
- * that will use task_lock() leading to deadlock
- * if somebody already has the lock
- */
- if (spin_is_locked (¤t->alloc_lock))
- printk ("alreadylocked\n");
- {
- struct mm_struct *mm = current->mm;
- if (mm)
- {
- printk (KERN_ALERT "stack size: %d (%d)\n",
- mm->start_stack - regs->REG_STACK_PTR,
- current->pid);
-
- stacksize = mm->start_stack - regs->REG_STACK_PTR;
- }
- else
- stacksize = 1;
- }
-
- - regular elf
- - usually have eh_frame section which is mapped into memory
- during normal operation
- - do stackwalk in kernel based on eh_frame
- - eh_frame section is usually mapped into memory, so
- no file reading in kernel would be necessary.
- - do stackwalk in userland based on eh_frame
- - do ebp based stackwalk in kernel
- - do ebp based stackwalk in userland
- - do heuristic stackwalk in kernel
- - do heuristic stackwalk in userland
-
- - Send heuristic stack trace to user space, along with
- location on the stack. Then, in userspace analyze the
- machine code to determine the size of the stack frame at any
- point. The instructions that would need to be recognized are:
-
- subl <constant>, %esp
- addl <constant>, %esp
- leave
- jcc
- push
- pop
-
- GCC is unlikely to have different stack sizes at the entry
- to a basic block.
-
- We can often find a vmlinux in /lib/modules/<uname-r>/build.
-
* "Expand all" is horrendously slow because update_screenshot gets called
for every "expanded" signal. In fact even normal expanding is really
slow. It's probably hopeless to get decent performance out of GtkTreeView,
@@ -212,6 +132,11 @@ Before 1.2:
* Missing things in binparser.[ch]
+ - error handling for bin_parser is necessary.
+
+ * Find out why all apps have an "In file /usr/bin/<app binary>" below
+ _libc_main. If possible, maybe make up a name for it.
+
- it's inconvenient that you have to pass in both a parser _and_
a record. The record should just contain a pointer to the parser.
On the other hand, the result does depend on the parser->offset.
@@ -225,17 +150,11 @@ Before 1.2:
- "native endian" is probably not useful. Maybe go back to just
having big/little endian.
- Should probably rethink the whole thing. It's just not very convenient to use, even
- for simple things like ELF files.
-
-* Rename stack_stash_foreach_by_address() to stack_stash_foreach_unique(),
- or maybe not ...
+ Should probably rethink the whole thing. It's just not very
+ convenient to use, even for simple things like ELF files.
* Make it compilable against a non-running kernel.
-* Maybe report idle time? Although this would come for free with the
- timelines.
-
* Fix (deleted) problem. But more generally, whenever we can't display a
symbol, display an error message instead, ie.,
- 'Binary file <xxxx> was deleted/replaced'
@@ -251,8 +170,6 @@ Before 1.2:
* Add spew infrastructure to make remote debugging easier.
-* Make it compile and work on x86-64
-
* With kernel module not installed, select Profiler->Start, then dismiss
the alert. This causes the start button to appear prelighted. Probably
just another gtk+ bug.
@@ -285,6 +202,57 @@ Before 1.2:
- maybe simply make stackstashes able to
save themselves.
+* See if the auto-expanding can be made more intelligent
+ - "Everything" should be expanded exactly one level
+ - all trees should be expanded at least one level
+
+- See if there is a way to make it distcheck
+
+- grep "FIXME - not10"
+
+- Ability to generate "screenshots" suitable for mail/blog/etc
+ UI: "generate screenshot" menu item pops up a window with
+ a text area + a radio buttons "text/html". When you flick
+ them, the text area is automatically updated.
+ - beginning in CVS:
+ - why does the window not remember its position when
+ you close it with the close button, but does remember
+ it when you use the wm button or the menu item? It actually
+ seems that it only forgets the position when you click the
+ button with the mouse. But not if you use the keyboard ...
+ This is a gtk+ bug.
+
+- Make busy cursors more intelligent
+ - when you click something in the main list and we don't respond
+ within 50ms (or perhaps when we expect to not be able to do
+ so (can we know the size in advance?))
+ - instead of what we do now: set the busy cursor unconditionally
+
+- Add view->ancestors/descendants menu items
+
+- rethink caller list, not terribly useful at the moment. Federico suggested
+ listing all ancestors.
+ Done: implemented this idea in CVS HEAD. If we keep it that way,
+ should do a globale s/callers/ancestors on the code.
+ - not sure it's an improvement. Often it is more interesting to
+ find the immediate callers.
+ - Now it's back to just listing the immediate callers.
+
+- hide internal stuff in ProfileDescendant
+
+
+==== Later =====
+
+* Whenever we fail to lock the atomic variable, track this, and send the
+ information to userspace as an indication of the overhead of the profiling.
+ Although there is inherent aliasing here since stack scanning happens at
+ regular intervals.
+
+* Maybe report idle time? Although this would come for free with the
+ timelines.
+
+* Make it compile and work on x86-64
+
- rethink loading and saving. Goals
- Can load 1.0 profiles
@@ -327,8 +295,8 @@ Before 1.2:
- make a generic representation of xml files with quarks for strings:
struct item {
int begin/end/text;
- quark text: -> begin/end/value
- int id; -> for begins that are pointed to
+ quark text: -> begin/end/value
+ int id; -> for begins that are pointed to
}
perhaps even with iterators. Should be compact and suitable for both
input and output. As a first cut, perhaps just split out the
@@ -351,93 +319,6 @@ Before 1.2:
-* See if the auto-expanding can be made more intelligent
- - "Everything" should be expanded exactly one level
- - all trees should be expanded at least one level
-
-* Send entire stack to user space, then do stackwalking there. That would
- allow us to do more complex algorithms, like dwarf, in userspace. Though
- we'd lose the ability to do non-racy file naming. We could pass a list
- of the process mappings with each stack though. Doing this would also solve
- the problem of not being able to get maps of processes running as root.
- Might be too expensive though. User stacks seem to be on the order
- of 100K usually, which for 200 times a second means a bandwidth of
- 20MB/s, which is probably too much. One question is how much of it
- usually changes.
- Actually it seems that the _interesting_ part of the stack
- (ie., from the stack pointer and up) is not that big in many cases. The
- average stacksize seemed to be about 7700 bytes for gcc compiling gtk+.
- Even deeply recursive apps like sysprof only generate about 55K stacks.
-
- Other possibilities:
-
- - Do heuristic stack walking where it lists all words on the stack
- that look like they might be return addresses.
-
- - Somehow map the application's stack pages into the client. This
- is likely difficult or impossible.
-
- - Another idea: copy all addresses that look like they could be
- return addresses, along with the location on the stack. This
- just might be enough for a userspace stack walker.
-
- - Yet another: krh suggests hashing blocks of the stack, then
- only sending the blocks that changed since last time.
-
- - every time you send a stackblock, also send a cookie.
-
- - whenever you *don't* send a stackblock, send the cookie
- instead. That way you always get a complete stacktrace
- conceptually.
-
- - also, that would allow the kernel to just have a simple
- hashtable containing the known blocks. Though, that could
- become large. Actually there is no reason to store the
- blocks; you can just send the hashcode. That way you
- would only need to store a list of hashcodes that we
- have generated previously.
-
- - One problem with doing DWARF walking is that the debug code
- will have to be faulted in. This can be a substantial amount
- of disk access which is undesirable to have during a
- profiling run. Even if we only have to fault in the
- .eh_frame_hdr section, that's still 18 pages for gtk+. The
- .eh_frame section for gtk+ is 72 pages.
-
- A possibility may be to consider two stacktraces identical
- if the only differing values are *outside* the text
- segments. This may work since stack frames tend to be the
- same size. Is there a way of determining the location of
- text segments without reading the ELF files? Maybe just
- check if it's inside an executable mappign.
-
- It is then sufficient in user space to only store one
- representative for each set of considered-identical stack
- traces.
-
- User space storage: Use the stackstash tree. When a new trace
- is added, just skip over nodes that differ, but where none of
- them points to text segments. Two possibilities then:
-
- - when two traces are determined to differ, store them
- in completely separate trees. This ensures that we
- will never run the dwarf algorithm on an invalid
- stack trace, but also means that we won't get shared
- prefixes for stacktraces.
-
- - when two traces are determined to differ, branch off
- as currently. This will share more data, but the
- dwarf algorithm could be run on invalid traces. It
- may work in practice though if the compiler
- generally uses fixed stack frames.
-
- A twist on is to mark the complete stack traces as
- "complete". Then after running the DWARF algorithm,
- the generated stack trace can be saved with it. This
- way incomplete stack traces branching off a complete
- one can be completed using the DWARF information for
- the shared part.
-
* Notes on heuristic stack walking
- We can reject addresses that point exactly to the beginning of a
@@ -521,33 +402,6 @@ Before 1.2:
This means the datastructure will probably have to be done a
little differently.
-- See if there is a way to make it distcheck
-
-- grep "FIXME - not10"
-
-- translation should be hooked up
-
-- Consider adding "at least 5% inclusive cost" filter
-
-- consider having the ability to group a function together with its nearest
- neighbours. That way we can eliminate some of the effect of
- "one function taking 10% of the time"
- vs.
- "the same function broken into ten functions each taking 1%"
- Not clear what the UI looks like though.
-
-- Ability to generate "screenshots" suitable for mail/blog/etc
- UI: "generate screenshot" menu item pops up a window with
- a text area + a radio buttons "text/html". When you flick
- them, the text area is automatically updated.
- - beginning in CVS:
- - why does the window not remember its position when
- you close it with the close button, but does remember
- it when you use the wm button or the menu item? It actually
- seems that it only forgets the position when you click the
- button with the mouse. But not if you use the keyboard ...
- This is a gtk+ bug.
-
- Find out how gdb does backtraces; they may have a better way. Also
find out what dwarf2 is and how to use it. Look into libunwind.
It seems gdb is capable of doing backtraces of code that neither has
@@ -562,12 +416,19 @@ http://www.linuxbase.org/spec/booksets/LSB-Embedded/LSB-Embedded/ehframe.html
http://cutebugs.net/bozo-profiler/
which has an elf32 parser/debugger
-- Make busy cursors more intelligent
- - when you click something in the main list and we don't respond
- within 50ms (or perhaps when we expect to not be able to do
- so (can we know the size in advance?))
- - instead of what we do now: set the busy cursor unconditionally
-
+ There is also libunwind, which seems potentially useful.
+
+- translation should be hooked up
+
+- Consider adding "at least 5% inclusive cost" filter
+
+- consider having the ability to group a function together with its nearest
+ neighbours. That way we can eliminate some of the effect of
+ "one function taking 10% of the time"
+ vs.
+ "the same function broken into ten functions each taking 1%"
+ Not clear what the UI looks like though.
+
- Consider adding ability to show more than one function at a time. Algorithm:
Find all relevant nodes;
For each relevant node
@@ -593,16 +454,6 @@ http://www.linuxbase.org/spec/booksets/LSB-Embedded/LSB-Embedded/ehframe.html
together with a tree. (Could add radio buttons somewhere in
in the right pane).
-- Add view->ancestors/descendants menu items
-
-- rethink caller list, not terribly useful at the moment. Federico suggested
- listing all ancestors.
- Done: implemented this idea in CVS HEAD. If we keep it that way,
- should do a globale s/callers/ancestors on the code.
- - not sure it's an improvement. Often it is more interesting to
- find the immediate callers.
- - Now it's back to just listing the immediate callers.
-
- Have kernel module report the file the address was found in
Should avoid a lot of potential broken/raciness with dlopen etc.
Probably better to send a list of maps with each trace. Which
@@ -630,15 +481,6 @@ http://www.linuxbase.org/spec/booksets/LSB-Embedded/LSB-Embedded/ehframe.html
char filenames [2048];
}
-- Figure out how Google's pprof script works. Then add real call graph
- drawing. (google's script is really simple; uses dot from graphviz).
- KCacheGrind also uses dot to do graph drawing.
-
-- hide internal stuff in ProfileDescendant
-
-- possibly add dependency on glib 2.8 if it is released at that point.
- (g_file_replace())
-
* Some notes about timer interrupt handling in Linux
On an SMP system APIC is used - the interesting file is arch/i386/kernel/apic.c
@@ -660,7 +502,172 @@ When the interrupt happens,
from kernel mode to kernel mode, it does _not_ push SS/ESP.
It does in both cases push EIP though.
-Later:
+* Strategies for taking reliable stacktraces.
+
+ Three different kinds of files
+
+ - kernel
+ - vdso
+ - regular elf files
+
+ - kernel
+ - eh_frame annotations, in kernel or in kernel debug
+ - /proc/kallsyms
+ - userspace can look at _stext and _etext to determine
+ start and end of kernel text segment
+ - copying kernel stack to userspace
+ - it's always 4096 bytes these days
+ - heuristically determine functions based on address
+ - callbacks on the stack can be identified
+ by having an offset of 0.
+ - even so there is a lot of false positives.
+ - return addresses always point to something
+ immediately after a call instruction.
+ - is eh_frame usually loaded into memory during normal
+ operation? It is mapped, but probably not paged in,
+ so we will be taking a few major page faults when we
+ first profile something.
+ Unless of course, we store the entire stack in
+ the stackstash. This may use way too much memory though.
+
+ - Locking, possibly useful code:
+
+ /* In principle we should use get_task_mm() but
+ * that will use task_lock() leading to deadlock
+ * if somebody already has the lock
+ */
+ if (spin_is_locked (¤t->alloc_lock))
+ printk ("alreadylocked\n");
+ {
+ struct mm_struct *mm = current->mm;
+ if (mm)
+ {
+ printk (KERN_ALERT "stack size: %d (%d)\n",
+ mm->start_stack - regs->REG_STACK_PTR,
+ current->pid);
+
+ stacksize =
+ mm->start_stack - regs->REG_STACK_PTR;
+ }
+ else
+ stacksize = 1;
+ }
+
+ - regular elf
+ - usually have eh_frame section which is mapped into memory
+ during normal operation
+ - do stackwalk in kernel based on eh_frame
+ - eh_frame section is usually mapped into memory, so
+ no file reading in kernel would be necessary.
+ - do stackwalk in userland based on eh_frame
+ - do ebp based stackwalk in kernel
+ - do ebp based stackwalk in userland
+ - do heuristic stackwalk in kernel
+ - do heuristic stackwalk in userland
+
+ - Send heuristic stack trace to user space, along with
+ location on the stack. Then, in userspace analyze the
+ machine code to determine the size of the stack frame at any
+ point. The instructions that would need to be recognized are:
+
+ subl <constant>, %esp
+ addl <constant>, %esp
+ leave
+ jcc
+ push
+ pop
+
+ GCC is unlikely to have different stack sizes at the entry
+ to a basic block.
+
+ We can often find a vmlinux in /lib/modules/<uname-r>/build.
+
+ * Send entire stack to user space, then do stackwalking
+ there. That would allow us to do more complex algorithms, like
+ dwarf, in userspace. Though we'd lose the ability to do non-racy
+ file naming. We could pass a list of the process mappings with
+ each stack though. Doing this would also solve the problem of
+ not being able to get maps of processes running as root.
+
+ Might be too expensive though. User stacks seem to be on the
+ order of 100K usually, which for 200 times a second means a
+ bandwidth of 20MB/s, which is probably too much. One question is
+ how much of it usually changes.
+
+ Actually it seems that the _interesting_ part of the stack (ie.,
+ from the stack pointer and up) is not that big in many
+ cases. The average stacksize seemed to be about 7700 bytes for
+ gcc compiling gtk+. Even deeply recursive apps like sysprof
+ only generate about 55K stacks.
+
+ Other possibilities:
+
+ - Do heuristic stack walking where it lists all words on the stack
+ that look like they might be return addresses.
+
+ - Somehow map the application's stack pages into the client. This
+ is likely difficult or impossible.
+
+ - Another idea: copy all addresses that look like they could be
+ return addresses, along with the location on the stack. This
+ just might be enough for a userspace stack walker.
+
+ - Yet another: krh suggests hashing blocks of the stack, then
+ only sending the blocks that changed since last time.
+
+ - every time you send a stackblock, also send a cookie.
+
+ - whenever you *don't* send a stackblock, send the cookie
+ instead. That way you always get a complete stacktrace
+ conceptually.
+
+ - also, that would allow the kernel to just have a simple
+ hashtable containing the known blocks. Though, that could
+ become large. Actually there is no reason to store the
+ blocks; you can just send the hashcode. That way you
+ would only need to store a list of hashcodes that we
+ have generated previously.
+
+ - One problem with doing DWARF walking is that the debug code
+ will have to be faulted in. This can be a substantial amount
+ of disk access which is undesirable to have during a
+ profiling run. Even if we only have to fault in the
+ .eh_frame_hdr section, that's still 18 pages for gtk+. The
+ .eh_frame section for gtk+ is 72 pages.
+
+ A possibility may be to consider two stacktraces identical
+ if the only differing values are *outside* the text
+ segments. This may work since stack frames tend to be the
+ same size. Is there a way of determining the location of
+ text segments without reading the ELF files? Maybe just
+ check if it's inside an executable mappign.
+
+ It is then sufficient in user space to only store one
+ representative for each set of considered-identical stack
+ traces.
+
+ User space storage: Use the stackstash tree. When a new trace
+ is added, just skip over nodes that differ, but where none of
+ them points to text segments. Two possibilities then:
+
+ - when two traces are determined to differ, store them
+ in completely separate trees. This ensures that we
+ will never run the dwarf algorithm on an invalid
+ stack trace, but also means that we won't get shared
+ prefixes for stacktraces.
+
+ - when two traces are determined to differ, branch off
+ as currently. This will share more data, but the
+ dwarf algorithm could be run on invalid traces. It
+ may work in practice though if the compiler
+ generally uses fixed stack frames.
+
+ A twist on is to mark the complete stack traces as
+ "complete". Then after running the DWARF algorithm,
+ the generated stack trace can be saved with it. This
+ way incomplete stack traces branching off a complete
+ one can be completed using the DWARF information for
+ the shared part.
- If the stack trace ends in a memory access instruction, send the
vma information to userspace. Then have user space
@@ -696,13 +703,11 @@ Later:
- Applications should be able to say "start profiling", "stop profiling"
so that you can limit the profiling to specific areas.
Idea:
- Add a new kernel interface that applications uses to say
- begin/end.
- Then add a timeline where you can mark interesting regions,
- for example those that applications have marked interesting.
-- Find out how to hack around gtk+ bug causing multiple double clicks
- to get eaten.
+ Add a new kernel interface that applications use to
+ say begin/end. Then add a timeline where you can mark
+ interesting regions, for example those that
+ applications have marked interesting.
- Consider what it would take to take stacktraces of other languages such
as perl, python, java, ruby, or bash. Or scheme.
@@ -727,11 +732,11 @@ Later:
records.
- Consider this usecase:
- Someone is considering replacing malloc()/free() with a freelist
- for a certain data structure. All use of this data structure is
- confined to one function, foo(). It is now interesting to know
- how much time that particular function spends on malloc() and free()
- combined.
+ Someone is considering replacing malloc()/free() with a
+ freelist for a certain data structure. All use of this data
+ structure is confined to one function, foo(). It is now
+ interesting to know how much time that particular function
+ spends on malloc() and free() combined.
Possible UI:
@@ -792,80 +797,99 @@ Later:
For Memory: badness=<cache line size not in cache>, cookie=<the address>
- Cookies are used to figure out whether an access is really the same, ie., for two identical
- cookies, the size is still just one, however
+ Cookies are used to figure out whether an access is really the
+ same, ie., for two identical cookies, the size is still just one,
+ however
- Memory is different from disk because you can't reasonably assume that stuff that has
- been read will stay in cache (for short profile runs you can assume that with disk,
- but not for long ones).
+ Memory is different from disk because you can't reasonably assume
+ that stuff that has been read will stay in cache (for short profile
+ runs you can assume that with disk, but not for long ones).
- - Perhaps show a timeline with CPU in one color and disk in one color. Allow people to
- look at at subintervals of this timeline. Is it useful to look at both CPU and disk at
- the same time? Probably not. See also marker discussion above. UI should probably allow
- double clicking on a marked section and all instances of that one would be marked.
+ - Perhaps show a timeline with CPU in one color and disk in one
+ color. Allow people to look at at subintervals of this
+ timeline. Is it useful to look at both CPU and disk at the same
+ time? Probably not. See also marker discussion above. UI should
+ probably allow double clicking on a marked section and all
+ instances of that one would be marked.
- - Other variation on the timeline idea: Instead of a disk timeline you could have a
- list of individual diskaccesses, and be able to select the ones you wanted to
- get rid of.
+ - Other variation on the timeline idea: Instead of a disk timeline
+ you could have a list of individual diskaccesses, and be able to
+ select the ones you wanted to get rid of.
- - The existing sysprof visualization is not terribly bad, the "self" column is
- more useful now.
+ - The existing sysprof visualization is not terribly bad, the
+ "self" column is more useful now.
- - See what files are accessed so that you can get a getter idea of what
- the system is doing.
+ - See what files are accessed so that you can get a getter idea of
+ what the system is doing.
- Optimization usecases:
- - A lot of stuff is read synchronously, but it is possible to read
- it asynchronously.
- Visualization: A timeline with alternating CPU/disk activity.
+ - A lot of stuff is read synchronously, but it is possible to
+ read it asynchronously. Visualization: A timeline with
+ alternating CPU/disk activity.
- What function is doing all the synchronous reading, and what
- files/offsets is it reading. Visualization: lots of reads across
- different files out of one function
+ files/offsets is it reading. Visualization: lots of reads
+ across different files out of one function
- A piece of the program is doing disk I/O. We can drop that
- entire piece of code. Sysprof visualization is ok, although seeing
- the files accessed is useful so that we can tell if those files are
- not just going to be used in other places. (Gnumeric plugin_init()).
-
- - A function is reading a file synchronously, but there is other
- (CPU/disk) stuff that could be done at the same time. Visualization:
- A piece of the timeline is diskbound with little or no CPU used.
-
- - Want to improve code locality of library or binary. Visualization:
- no GUI, just produce a list of functions that should be put first in
- the file. Then run the program again until the list converges.
- (Valgrind may be more useful here).
+ entire piece of code. Sysprof visualization is ok, although
+ seeing the files accessed is useful so that we can tell if
+ those files are not just going to be used in other
+ places. (Gnumeric plugin_init()).
+
+ - A function is reading a file synchronously, but there is
+ other (CPU/disk) stuff that could be done at the same
+ time. Visualization: A piece of the timeline is diskbound
+ with little or no CPU used.
+
+ - Want to improve code locality of library or
+ binary. Visualization: no GUI, just produce a list of
+ functions that should be put first in the file. Then run the
+ program again until the list converges. (Valgrind may be
+ more useful here).
- Nautilus reads a ton of files, icons + all the files in the
- homedirectory. Normal sysprof visualization is probably useful
- enough.
+ homedirectory. Normal sysprof visualization is probably
+ useful enough.
- - Profiling a login session.
+ - Profiling a login session.
- - Many applications are running at the same time, doing IPC. It would
- be useful if we could figure out what other things a given process
- is waiting on. Eg., in poll, find out what processes have the other
- ends of the fd's open.
- Visualization: multiple lines on a graph. Lines join up where
- one process is blocking on another. That would show processes holding
- up the progress very clearly.
- This was suggested by Federico.
+ - Many applications are running at the same time, doing
+ IPC. It would be useful if we could figure out what other
+ things a given process is waiting on. Eg., in poll, find out
+ what processes have the other ends of the fd's open.
+ Visualization: multiple lines on a graph. Lines join up
+ where one process is blocking on another. That would show
+ processes holding up the progress very clearly. This was
+ suggested by Federico.
- - Need to report stat() as well. (Where do inode data end up? In the
- buffer-cache?) Also open() may cause disk reads (seeks).
+ - Need to report stat() as well. (Where do inode data end up? In
+ the buffer-cache?) Also open() may cause disk reads (seeks).
- To generate the timeline we need to know when a disk request is
- issued and when it is completed. This way we can assign blame to all
- applications that have issued a disk request at a given point in time.
+ issued and when it is completed. This way we can assign blame to
+ all applications that have issued a disk request at a given
+ point in time.
- The disk timeline should probably vary in intensity with the number
- of outstanding disk requests.
+ The disk timeline should probably vary in intensity with the
+ number of outstanding disk requests.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=- ALREADY DONE: -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
+* Get rid of O(n^2) string handling in on_read(). It wasn't O(n^2)
+
+* Rename stack_stash_foreach_by_address() to stack_stash_foreach_unique(),
+ or maybe not ... No.
+
+- Figure out how Google's pprof script works. Then add real call graph
+ drawing. (google's script is really simple; uses dot from graphviz).
+ KCacheGrind also uses dot to do graph drawing.
+
+* possibly add dependency on glib 2.8 if it is released at that point.
+ (g_file_replace())
+
* Rename sysprof-text to sysprof-cli
* Find out why the samples label won't right adjust
diff --git a/collector.c b/collector.c
index b3bd984..0c3fcd8 100644
--- a/collector.c
+++ b/collector.c
@@ -420,7 +420,7 @@ start_tracing (Collector *collector,
{ SYSPROF_DIR "current_tracer", "sysprof" },
{ SYSPROF_DIR "trace_options", "raw" },
{ SYSPROF_DIR "trace_options", "bin" },
- { SYSPROF_DIR "sysprof_sample_period", "2000" },
+ { SYSPROF_DIR "sysprof_sample_period", "5000" },
};
int fd;
diff --git a/sysprof.c b/sysprof.c
index 38e4c3c..82c0dec 100644
--- a/sysprof.c
+++ b/sysprof.c
@@ -176,7 +176,7 @@ static void
queue_show_samples (Application *app)
{
if (!app->timeout_id)
- app->timeout_id = g_timeout_add (225, show_samples_timeout, app);
+ app->timeout_id = g_timeout_add (500, show_samples_timeout, app);
}
static void
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]