Re: [Evolution-hackers] missing Evolution 1.4.4 source RPMs



Jeffrey Stedfast wrote:
any way for us to find out more about the types of techniques you are
working on? anything that might help debug this beast would be cool :-)

In brief, the instrumented code makes a large number of wild guesses about "interesting" behavior, and counts how often these interesting things happen. We then use a suite of statistical machine learning techniques to find *changes* in interesting behavior between runs that fail (crash) and runs that succeed.

"Interesting" can be, well, anything you want to count up. Some examples include:

  - how often each function call returns <0, ==0, or >0

  - how often each conditional branches left versus right

  - how often each assigned variable is <, ==, or > each other
    in-scope variable or source-code constant

Like I said, these are very wild guesses. Most of them are completely irrelevant. The machine learning algorithms filter out the irrelevant stuff in order to find the few behaviors which are strongly correlated with success versus failure. So, for example, I might be able to tell you that Evolution tends to crash when fetch_ldap_id() returns 0 and it tends to not crash when fetch_ldap_id() returns >0. Or perhaps you're tend to crash when a particular "if" condition is true, because that's a rare case that hasn't been tested well.

Adding to the challenge is the fact that we don't even count everything on a given run. We randomly sample perhaps 1/100 or 1/1000 of the behavior. This helps us keep overhead down by spending most of the time in fast uninstrumented code. It also has the side effect of improving privacy, as we really cannot learn very much at all about any single run. But in *aggregate*, over many runs, an fair picture of program (mis)behavior will emerge. The statistical models we use treat the sparse sampling as measurement noise; given enough runs, we can still get useful bug clues from sparsely sampled data.

See <http://www.cs.berkeley.edu/~liblit/sampler/> for some packages we've posted. We still need to write up a good non-technical primer. Until that's done, the "Background Reading" section on that page has pointers to papers describing how this all works in much more detail.

Now, how about those specfiles...?  {taps foot}  :-)




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]