Dissecting The Bug

Saturday, January 30, 2010

Shifting gears

Attending school forces me into a different mind-set, requiring me to slow down and really think hard about abstract concepts, rather than just fixing bugs as quickly as possible. I like this. It’s also very inspiring to sit in class and listen. It’s been a while and I’ve forgotten, but it’s kind of like being in a conference session without the glitz and knowing it’s going on for several hours, days, and months, vs. a mere 50 minutes. I was impressed, too, by my professor, Paul Ammann, who’s incredibly smart, skilled, and understands the practical needs of technology. This is key—having professors who don’t acknowledge what practitioners have to do to stay employed shows a lack of understanding. Paul, on the other hand, demonstrates a keen grasp of this important concept. Many students, myself included, enjoy abstraction, see value in it, but also need to apply concrete concepts in order to put food on the table and clothes on the kids. Case in point, almost to the minute my job agreed to pay for this class, I got a new project: “Bill, we want you to automate the testing for Project XYZ, our highest priority project”. Little do they know that this course is not about automating testing—it’s about good test design, which, of course, is fundamental to any good quality assurance efforts.

Ok, enough about me …

So, in the most recent class (#2), we discussed a single software fault in painstaking detail. It was dissection, the main concepts being _observability_ and it's antithesis.  These are my notes, some reflection, practice, a little poetic license, and likely a mistake or two.

RIP Observability Model

This model articulates three abstract properties that must exist in a program execution in order for a software fault to be observable.
  • Reachability – the fault in the code has to be reachable
  • Infection – the fault has to put the program into an error state. This does not necessarily mean the program will return incorrect results—this is an important distinction! Just because a program is in an error state does not mean that it will always produce the wrong result. see below
  • Propagation – the program needs to exhibit incorrect behavior; e.g., incorrect outputs or behavior or throwing an unspecified exception
A program can exhibit zero, one, or two of these properties, but in order for it to be observable all these criteria need to be met.


This code looks simple, and it is, but using simple code helps in grasping the RIP concepts. This is not the same code we covered in class, but it is in the book. The code has a bug; try to find it …
* Effects: returns the index of the last element that matches y, else returns -1
* @returns int
* @throws NullPointerException
public int lastIndexOf(int[] x, int y) {

  for (int i = x.length-1; i > 0; i--){
   if( x[i]==y ) return i;
 return -1;
A typical JUnit test for this would look something like following and the test fails, but why?
public void testLastIndexOf(){
   int expected = 0;
   int[] x = {2, 3, 4, 5};
   int y = 2;
   int actual = theClass.lastIndexOf(x,y);
  Assert.assertEquals( expected, actual );

The bug isn't too hard to find simply by looking closely at the code. But if this were buried in some stacktrace it would be much harder to locate—here we already know where it is, within a few lines of code. In the real world, we would simple change the > to >=, pat ourselves on the back for swatting the fly, and go grab some Fritoes. But with a slightly more rigorous approach, we have an opportunity to explore some interesting and useful concepts—state and what makes this particular defect observable or not. To be clear, the fault in the code above is the usage of the '>' operator.

The RIP model states that in order for the fault to be observable, it needs to satisfy all three criteria—Reachability, Infection, and Propagation. It’s also helpful to show cases (test cases) where the fault is not observable, yet puts the program in an error state. How can a program be in an error state and not output an error? These are the kinds of bugs many of us have wrestled with—the kind of bug that gnashes its sharp mandibles only under certain circumstances.

The questions to ask are (and these are loosely taken from the book) :

  • in which case(s) would the fault not execute?
  • in which case(s) would the fault execute yet not create an error state?
  • in which case(s) would the fault execute and create an error state, but not propagate?
  • in which case(s) would the inputs cause execution of the of the fault, cause an error state, and propagate?

If you're still reading my guess is that you want to know how you can execute a fault, be in an error state, yet not propagate. The other questions can be figured out without too much problem, but this one, might cramp the brain a bit.  First, let's define what state is.  State can be defined as a combination of variable and value pairs plus the program counter (PC) — this is where the code should be executing. It can also be viewed as a snapshot of the data plus the location of the program's execution pointer. Example:

S 0 → S 1 → S 2 → S 3 → S 4 → S 5 , where S n represents some state of execution within the program.

But, sometimes the program skips a state and goes from S 3 → S 5. This is the error state. If a some code is supposed to execute some line and it does not, that's skipping an intended state, which is bad.

In S 0 we’re entering the lastIndexOf(...) method and we have values for inputs x and y. In S 1 we’re now at for(int i = x.length-1; ...) and our state now has some value for i. What if we have a single element array; e.g., x = [ 1 ] instead? Because it’s only one element, this gives i the value of 0, so the bug i > 0 kicks us out of the for loop, pushes us down to the return statement, and then returns the correct result, -1. The error state—and this is the take-home—is the fact the PC never reaches the if( x[i] == y ) statement.

An analysis of the state of the program when the error state was entered can be presented like this:

Expected State         Actual State
 x = [ 1 ]             x = [ 1 ]    
 y = 2                 y  = 2
 i = 0                 i = 0     
 PC =  if( x[i]==y )   PC = return -1;  <-- error state

In more concrete terms, this is just saying the code didn’t do what it was supposed to, because it never evaluated the conditional expression if ( x[i]==y ). In RIP terms this is saying that Reachability was satisfied, Infection was also satisfied (an error state occurred), but Propagation did not occur. So, given the inputs, this software fault is not observable. Again, this is the kind of bug that, usually buried deep in some system, causes us pain. It's very difficult to find after the fact, but a simple unit test up front would catch it.

Today, one test I would write would be make sure that the y was present in every possible index of the array being searched:

x = [2,3,4,5]
x = [3,2,4,5]
x = [3,4,2,5]
x = [3,4,5,2]

Some array shifting algorithm would be useful here to make test case inputs easier to generate!

Chances are, in practice, you would never go into this much detail about such a small fault. This is taking an opportunity to think about bugs, or software faults, in detailed and abstract terms, which is not only a great exercise, but, it's also a useful and practical way to reason about the complexities of building quality into software. We're not constrained by a particular language implementation, or platform, and dealing with abstractions such as this transcends much of the industry noise and helps us to focus on the principals of software quality, rather than the software itself.

Next up: Coverage Criteria

Test and Be Happy!

bill shelton


Mike said...

Fantastic post! This is a great example of why it is important to study some of the science behind our programming. It teaches us how to think more clearly. Sometimes it takes a little longer to solve the issue at hand, but look at how we will be able to approach similar issues after our personal discovery.

Sometimes we are forced to just "git 'er done", but given a choice, I'd rather not paint over mold. If we are able to recognize and apply proper theory to our projects, we don't have to spend so much effort going back over problems that have already been solved.

I'm really looking forward to your next post.

bill shelton said...

Thanks, Mike.

Just "sometimes" we're forced to git 'er done? ;-) I like the mold analogy! It grows in the dark and dies if exposed.