call for collaborators: bug isolation via remote program sampling



My collaborators and I are working on a tool for isolating bugs using random sampling of user executions. This message is an open call for collaboration. Would any of you be interested in working with us to use our bug isolation system with your GNOME projects or distributions? If so, please read on.

This tool is part of UC Berkeley's Open Source Quality Project <http://osq.cs.berkeley.edu/>, which investigates methods for improving software quality. In brief, our approach is to collect a little information from each of many runs and look for program behaviors that vary between successful and unsuccessful runs. For example, we might discover that a program crashes when a particular function call returns -1, or when a particular array index exceeds some maximum value. Even for non-deterministic bugs such as heap corruption we can build statistical models that show behavior correlated with crashes across many runs.

We've already had a few successes in controlled experimental environments, including discovering a previously unreported buffer overrun. What we'd like to do now is deploy our system with real applications, real bugs, and real users.

Our approach works best when it can see many runs, so we need a community of users who are willing to run instrumented code and provide us with that raw data to mine for bugs. If you work on a project that provides testing binaries to willing guinea pigs, then you're someone we'd like to work with. (Only C projects need apply, though, as that's the only language supported by our current implementation.)

The benefit for us is real data; the benefit for you is free help with bug hunting. The sampling is designed to be very low overhead, so the performance penalty for you or your users will be modest (our worst example so far runs 12% slower when instrumented, and for our best example the overhead is literally unmeasurable). Furthermore, our sampling-based approach implicitly "learns" the most about the bugs that happen most often, so we may be able to give you the most useful information about the bugs that are hitting the largest number of your users.

We've written a couple of papers about our approach:

   - "Sampling User Executions for Bug Isolation", a short position
     paper that presents the general approach and describes some
     initial experiments: <http://www.cs.berkeley.edu/~liblit/ramss/>

   - "Bug Isolation via Remote Program Sampling", a much more detailed
     writeup which describes how the instrumentation sampling works,
     measures performance impact, and gives several examples of using
     the system to track down bugs:
     <http://www.cs.berkeley.edu/~liblit/bug-isolation/>

Of course, we're also happy to discuss this with any of you on this list or in person-to-person e-mail. Our goal right now is to find real-world collaborators, so if you are at all interested or if you have any questions, please ask!

				-- Ben Liblit <liblit cs berkeley edu>




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]