call for collaborators: bug isolation via remote program sampling



My fellow researchers and I are working on a tool for isolating bugs using random sampling of user executions. This message is an open call for collaboration. Would any of you be interested in working with us to use our bug isolation system with your GNOME projects or distributions? If so, please read on.
This tool is part of UC Berkeley's Open Source Quality Project 
<http://osq.cs.berkeley.edu/>, which investigates methods for improving 
software quality.  In brief, our approach is to collect a little 
information from each of many runs and look for program behaviors that 
vary between successful and unsuccessful runs.  For example, we might 
discover that a program crashes when a particular function call returns 
-1, or when a particular array index exceeds some maximum value.  Even 
for non-deterministic bugs such as heap corruption we can build 
statistical models that show behavior correlated with crashes across 
many runs.
We've already had a few successes in controlled experimental 
environments, including discovering a previously unreported buffer 
overrun.  What we'd like to do now is deploy our system with real 
applications, real bugs, and real users.
Our approach works best when it can see many runs, so we need a 
community of users who are willing to run instrumented code and provide 
us with that raw data to mine for bugs. If you work on a project that 
provides testing binaries to willing guinea pigs, then you're someone 
we'd like to work with.  (Only C projects need apply, though, as that's 
the only language supported by our current implementation.)
The benefit for us is real data; the benefit for you is free help with 
bug hunting.  The sampling is designed to be very low overhead, so the 
performance penalty for you or your users will be modest (our worst 
example so far runs 12% slower when instrumented, and for our best 
example the overhead is literally unmeasurable).  Furthermore, our 
sampling-based approach implicitly "learns" the most about the bugs that 
happen most often, so we may be able to give you the most useful 
information about the bugs that are hitting the largest number of your 
users.
We've written a couple of papers about our approach:

   - "Sampling User Executions for Bug Isolation", a short position
     paper that presents the general approach and describes some
     initial experiments: <http://www.cs.berkeley.edu/~liblit/ramss/>

   - "Bug Isolation via Remote Program Sampling", a much more detailed
     writeup which describes how the instrumentation sampling works,
     measures performance impact, and gives several examples of using
     the system to track down bugs:
     <http://www.cs.berkeley.edu/~liblit/bug-isolation/>

Of course, we're also happy to discuss this with any of you on this list or in person-to-person e-mail. Our goal right now is to find real-world collaborators, so if you are at all interested or if you have any questions, please ask!
				-- Ben Liblit <liblit cs berkeley edu>





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]