Dissecting The Bug

Saturday, January 30, 2010

Shifting gears

Attending school forces me into a different mind-set, requiring me to slow down and really think hard about abstract concepts, rather than just fixing bugs as quickly as possible. I like this. It’s also very inspiring to sit in class and listen. It’s been a while and I’ve forgotten, but it’s kind of like being in a conference session without the glitz and knowing it’s going on for several hours, days, and months, vs. a mere 50 minutes. I was impressed, too, by my professor, Paul Ammann, who’s incredibly smart, skilled, and understands the practical needs of technology. This is key—having professors who don’t acknowledge what practitioners have to do to stay employed shows a lack of understanding. Paul, on the other hand, demonstrates a keen grasp of this important concept. Many students, myself included, enjoy abstraction, see value in it, but also need to apply concrete concepts in order to put food on the table and clothes on the kids. Case in point, almost to the minute my job agreed to pay for this class, I got a new project: “Bill, we want you to automate the testing for Project XYZ, our highest priority project”. Little do they know that this course is not about automating testing—it’s about good test design, which, of course, is fundamental to any good quality assurance efforts.

Ok, enough about me …

So, in the most recent class (#2), we discussed a single software fault in painstaking detail. It was dissection, the main concepts being _observability_ and it's antithesis.  These are my notes, some reflection, practice, a little poetic license, and likely a mistake or two.

RIP Observability Model

This model articulates three abstract properties that must exist in a program execution in order for a software fault to be observable.
  • Reachability – the fault in the code has to be reachable
  • Infection – the fault has to put the program into an error state. This does not necessarily mean the program will return incorrect results—this is an important distinction! Just because a program is in an error state does not mean that it will always produce the wrong result. see below
  • Propagation – the program needs to exhibit incorrect behavior; e.g., incorrect outputs or behavior or throwing an unspecified exception
A program can exhibit zero, one, or two of these properties, but in order for it to be observable all these criteria need to be met.


Dissection


This code looks simple, and it is, but using simple code helps in grasping the RIP concepts. This is not the same code we covered in class, but it is in the book. The code has a bug; try to find it …
/**
*
* Effects: returns the index of the last element that matches y, else returns -1
*
* @returns int
* @throws NullPointerException
*/
public int lastIndexOf(int[] x, int y) {

  for (int i = x.length-1; i > 0; i--){
   if( x[i]==y ) return i;
  }
  
 return -1;
}
A typical JUnit test for this would look something like following and the test fails, but why?
@Test
public void testLastIndexOf(){
   int expected = 0;
   int[] x = {2, 3, 4, 5};
   int y = 2;
   int actual = theClass.lastIndexOf(x,y);
  Assert.assertEquals( expected, actual );
  }

The bug isn't too hard to find simply by looking closely at the code. But if this were buried in some stacktrace it would be much harder to locate—here we already know where it is, within a few lines of code. In the real world, we would simple change the > to >=, pat ourselves on the back for swatting the fly, and go grab some Fritoes. But with a slightly more rigorous approach, we have an opportunity to explore some interesting and useful concepts—state and what makes this particular defect observable or not. To be clear, the fault in the code above is the usage of the '>' operator.

The RIP model states that in order for the fault to be observable, it needs to satisfy all three criteria—Reachability, Infection, and Propagation. It’s also helpful to show cases (test cases) where the fault is not observable, yet puts the program in an error state. How can a program be in an error state and not output an error? These are the kinds of bugs many of us have wrestled with—the kind of bug that gnashes its sharp mandibles only under certain circumstances.

The questions to ask are (and these are loosely taken from the book) :

  • in which case(s) would the fault not execute?
  • in which case(s) would the fault execute yet not create an error state?
  • in which case(s) would the fault execute and create an error state, but not propagate?
  • in which case(s) would the inputs cause execution of the of the fault, cause an error state, and propagate?


If you're still reading my guess is that you want to know how you can execute a fault, be in an error state, yet not propagate. The other questions can be figured out without too much problem, but this one, might cramp the brain a bit.  First, let's define what state is.  State can be defined as a combination of variable and value pairs plus the program counter (PC) — this is where the code should be executing. It can also be viewed as a snapshot of the data plus the location of the program's execution pointer. Example:


S 0 → S 1 → S 2 → S 3 → S 4 → S 5 , where S n represents some state of execution within the program.


But, sometimes the program skips a state and goes from S 3 → S 5. This is the error state. If a some code is supposed to execute some line and it does not, that's skipping an intended state, which is bad.

In S 0 we’re entering the lastIndexOf(...) method and we have values for inputs x and y. In S 1 we’re now at for(int i = x.length-1; ...) and our state now has some value for i. What if we have a single element array; e.g., x = [ 1 ] instead? Because it’s only one element, this gives i the value of 0, so the bug i > 0 kicks us out of the for loop, pushes us down to the return statement, and then returns the correct result, -1. The error state—and this is the take-home—is the fact the PC never reaches the if( x[i] == y ) statement.


An analysis of the state of the program when the error state was entered can be presented like this:

Expected State         Actual State
------------------------------------------------------------------------------------
 x = [ 1 ]             x = [ 1 ]    
 y = 2                 y  = 2
 i = 0                 i = 0     
 PC =  if( x[i]==y )   PC = return -1;  <-- error state

In more concrete terms, this is just saying the code didn’t do what it was supposed to, because it never evaluated the conditional expression if ( x[i]==y ). In RIP terms this is saying that Reachability was satisfied, Infection was also satisfied (an error state occurred), but Propagation did not occur. So, given the inputs, this software fault is not observable. Again, this is the kind of bug that, usually buried deep in some system, causes us pain. It's very difficult to find after the fact, but a simple unit test up front would catch it.

Today, one test I would write would be make sure that the y was present in every possible index of the array being searched:


Testcases:
x = [2,3,4,5]
x = [3,2,4,5]
x = [3,4,2,5]
x = [3,4,5,2]

Some array shifting algorithm would be useful here to make test case inputs easier to generate!

Chances are, in practice, you would never go into this much detail about such a small fault. This is taking an opportunity to think about bugs, or software faults, in detailed and abstract terms, which is not only a great exercise, but, it's also a useful and practical way to reason about the complexities of building quality into software. We're not constrained by a particular language implementation, or platform, and dealing with abstractions such as this transcends much of the industry noise and helps us to focus on the principals of software quality, rather than the software itself.

Next up: Coverage Criteria

Test and Be Happy!

bill shelton

Software Testing

Monday, January 18, 2010

I “am” plugging this book, and for good reason …


Update (January 22, 2010)
: The authors (Paul Ammann and Jeff Offutt) donate 100% of the book's royalties to the Software Engineering Scholarship Fund, a scholarship fund for software engineering students.
 
Admittedly, I am a bit biased, being both a student of the authors and a QA geek. To create even further bias, even though I bought a copy of the book for over a year ago, I never thoroughly read the preface until yesterday. I was both surprised and flattered to read my name in print, acknowledged as one of the students who contributed and helped with the book. I recall reading and commenting on the text before it was published and then being disappointed that I couldn’t attend the class before I graduated—it was a new course then and I had some graduation requirements to fulfill.

Coming from someone who contributes to the core MXUnit test framework, written a few blog posts, and spoken at conferences on testing and TDD, the book’s title may be construed as misleading. It can be used as an introductory text, but also presents practical, novel, and advanced concepts, such as coverage criteria (graph and logic coverage, input space partitioning, and syntax-based testing ). I suspect one of the main reason’s for the title is to make it accessible at many levels—and this is a big asset. Introduction to Software Testing can be used both as a text and as a test engineer or developer’s reference.

One of the goals of the book is to strike a balance between theory and practice. The book’s introduction establishes the context of testing in software engineering from a more formal process point of view. (It might be helpful or more accessible if the book were to also mention Agile methods, too, in which testing is an integral part of the process.) The book also identifies the fundamental weakness we all have to deal with in respect to testing—Testing can show the presence of failures, not their absence. This key point applies to both manual and automated testing and at all levels, unit through acceptance. In other words, based on your tests, you can’t prove that that your code does not have bugs. This is one of the main problems the book addresses through exploration of coverage criteria. Through coverage criteria, is it possible to describe and possibly measure the relative quality of your application? …

Now some 4+ years after receiving a masters, I’m going back to school and the first class I’m taking is SWE-637, Software Testing, the class I missed on the first go-around. I’m excited to dig deep, however, having a full-time job and being a parent drives me to try to work as smart as possible (that’s a constant learning process). Adding a graduate software engineering program to that, makes for very little time for anything other than that which absolutely necessary. With that, my goal for this semester is to focus on automated test design—how best to design automated tests in order to cover as many requirement criteria as possible in the least amount of space and time. I’d like to use functional testing as the primary medium and (as a bonus) develop a good grasp of the upcoming 2.0 version of Selenium (with Webdriver) as the test harness. But the focus will be on the principals of good test design rather than the tools.

Coverage criteria and test design is largely what the course and the book address. After focusing on TDD, unit testing, and building a testing tool, I still have the unanswered question of “what makes a test a good test?”. And how can I measure how much of a class, component, system, object, etc., is adequately covered by my tests? I know that when I write a lot of tests and take the time to reason about what I’m doing, I know the quality of my software is better. I know, too, coverage tools can let me know what parts of my code have not been exercised. Yet still, there’s that lingering question that tests can’t show the absence of defects …

Test and be Happy!

bill shelton

Using Eclipse MAT to track down a ColdFusion server Out-of-Memory Error

Monday, January 11, 2010

Recently, this came across my desk:  “This process is causing ColdFusion to crash on the DEV server. It hasn’t crashed the developer’s local machine. Can you take a look?”

In this post, I’ll show how I came to the answer in this specific case. I’m putting this out there because the solution was both surprising and simple. I’m also writing about it because I want to show how I came to the answer very quickly, rather than spending hours going down a bunch of dark alleys.

The Short Answer

For the impatient, or the people who got here from Google, here’s the conclusion: ColdFusion Server Monitor can, indeed, cause your memory to spike, and crash your server, with “Profiling” turned on. It’s common knowledge that keeping the “Memory Tracking” turned on can crash CF; however, it’s rare that you hear about “Profiling” causing troubles.  In my case, I believe the reason was setting the Database “Slowest Queries” and “Most Frequently Run” Queries to be very aggressive. For Slowest Queries, I had it set to give me the last couple hundred queries of 0 seconds. For Most Frequently run, I changed the default from 20 to at least 100.

To solve the server crash, I simply turned Profiling off.

 

The Journey

For those interested in seeing how I concluded Profiling (or, rather, my aggressive settings) was to blame, read on.

The Usual Suspects

I started by checking the usual OutOfMemory culprits:

  • Debugging on?
  • Memory Monitoring on?

Nope.

I went to server monitor and looked at the database stats: it confirmed my gut reaction, which was that the process was running thousands of queries. This isn’t necessarily a server killer in and of itself, but in my experience, OutOfMemory problems usually lead back to database access. So I went to the source code and checked out the queries that server monitor indicated were frequently run queries. I was looking for the tell-tale server killer: a couple of queries in a loop, and the loop having the possibility of running dozens of thousands of times. What I saw was perhaps questionable, but nothing shocking or easily identifiable as the offender. So, I put this in the back of my head and went to the next step in my process: get a heap dump.

By now, I’m about 10 minutes into the investigation. The 2 most obvious suspects weren’t implicated, and the 3rd most obvious was suspicious but not definitive. Now, at this point, JRun was up over 1GB of RAM, so a heap dump and parsing the dump would take a long time.

Getting a Heap Dump

I RDP’d into the dev server, using “mstsc /admin”. You might need to use /console instead. In my experience, using jmap over RDP usually requires one of these options.

I always have a JDK on the CF Server, so these instructions assume you either have a JDK installed or can install one.

  • Open a cmd, and type: jmap –dump:format=b,file=c:\jrun4\heapdump.bin <pid>, where pid is the windows Process ID of jrun.exe

this will put a heap dump file at c:\jrun4\heapdump.bin. If you want it somewhere else, just change the file param to the jmap command.  Now, we need something to parse this file and show us what’s sucking up all the memory.

Analyzing a Heap Dump with Eclipse MAT

Before I describe the few steps I took, I want to point out Brian Ghidinelli’s excellent set of posts on analyzing memory problems in ColdFusion. My post is meant to be  a quick and dirty description of how I found an answer to a memory problem quickly using Eclipse MAT. Brian’s articles are exhaustive, and I strongly encourage you to check them out if you want to *really* learn how to do this stuff.

Now then: I opened the MAT perspective in Eclipse (this is on 3.5), and I Opened the heapdump.bin file that jmap spit out. It spun for quite some time, and then I got an Out Of Memory error in Eclipse. Ironic, eh? This one was easy to identify, since I was running Eclipse with only 512MB of RAM, and the file I was parsing was over a gig. So I set Eclipse to run with 1350MB of ram by using the-Xmx option in eclipse.ini, and tried to open the file again. This time, it got pretty far, gave an error, but it still had generated some data. It didn’t give the full information one would need for an exhaustive analysis, but it gave me enough. Had I needed more information, I would’ve followed Brian’s instructions in this post for using Eclipse MAT to parse very large files.

Once I had some information on screen, I clicked the “Dominator Tree” link, and this is what I saw:

bigheap

Go ahead. Open that file. Do you see that line at the top… the one with 22 million instances of a single class name consuming almost 1GB of Memory? Yeah… that’s the problem. His name is coldfusion.monitor.stack.CFStackFrame, and he was my huckleberry.

At this point in my investigation, I do not know exactly what that class is, but judging by its name, I’m pretty sure it’s related to the CF server monitor. I open server monitor, turn off just the “Monitoring” option, and re-run the application. It crashes within minutes.

I restart CF, open server monitor, and turn Profiling off. I re-run the application, and it successfully completes. I then turn Monitoring back on – to be sure that it’s isolated to Profiling – and re-run the application. It completes.

If there’s a lesson, it’s:

After you spend a few minutes investigating and ruling out the usual suspects, stop speculating. Obtain data, and let it guide you.

What would your CFML look like with closures?

Thursday, January 7, 2010

In this article, I’ll describe my exploration of closures: what they are, why programmers love them so darn much, and what your CFML code might look like if they were to be introduced into the language. A paragraph worth of disclaimers wherein I basically admit “I don’t know what I’m talking about” will follow. But I figured that was no way to pull you in, so… without further adieu, let’s start.

The Problem

I’m learning that most advances in one’s thinking start with a problem. This should be common-sense. It’s not like we sit in our chairs all day thinking “I wonder if I can come up with a nifty solution to something today.” Rather, we have problems to solve, we think about the existing solutions, and every once in a while, we realize that what we currently can do isn’t what we should do. Sometimes, the solution hasn’t been born yet. This originality is rare and I’ll leave it to really, really, really smart people. Often, the currently-unavailable solution has already been born, just in other contexts. In my case, the problem I was trying to solve would have been neatly handled if CFML had closures, which I knew about from various other languages. So let’s talk about the problem, and why closures are a nice way to approach it. I apologize in advance if this specific problem seems confusing. I thought about drumming up a fake problem for sake of clarity, but I decided against that. Real-world problems, though harder to initially grok, have always helped me more fully understand things, even if it takes two reads.

I have a “Formatter” component. It’s a utility component that does some all-purpose formatting of financial data. Dollars, percents, that sort of junk. This is CF, not Java, and as such we’re not in the business of passing around “Money” objects when a simple decimal value will do. Consequently, it’s useful to be able to pass these values through a Formatter object for display purposes (think: barcharts, tabular data, etc) while still retaining their raw value for use in calculations and whatnot.

I have a “TableBuilder” component. Its purpose is to take arrays of raw data in an “addTableRow” function and turn them into tables for display. This builder accepts various formatting options, the raw data (one row at a time), and can when requested add additional rows/columns based on the raw data (doing other calculations, etc) as a convenience. When it prints the raw data, it uses the Formatter

I have a “ThingThatUsesTableBuilder” component. This component’s job is to build a part of a PDF document. It does all manner of boring things – put some text here, put some text there, add some junk here, draw a box there – and also it needs to plop a table smack dab in the middle. It will use the TableBuilder for that purpose.

So: ThingThatUsesTableBuilder –>uses TableBuilder—>uses Formatter

When this TableBuilder was conceived, it was only ever meant to display integer values. As in, if it received decimal values, it rounded them. It was a “Very Bad Thing”(tm) if decimals ever displayed in these tables. Litigation, torture, foreclosure, you name it. Consequently, we built that behavior into this thing. “If you are a decimal, ye shall be smoted into an int”.

Queue the old timey Detective music. And then “She” came along. 

You know how it is. You nail down the specs. You argue incessantly about “will we ever need this to be more flexible?” “Avaunt! and quit my sight!”  And so it goes. You build it to spec, and then down the road, things change. And you end up in my shoes:

“Marc, what we really need, for this one table, for this one client, for just the first row, is that the decimals should appear. And here’s all the behavior around decimals”. When you’re dealing with Other People’s Money(tm), this stuff can get tricky.

Fortunately, this formatting business can all be nicely tucked away into the Formatter object and your calculation “engine” (engine means “a thing with a shitload of ‘if’ statements) can remain blissfully unaware of the unwholesome demands being placed upon its sanctity. Unfortunately, the TableBuilder looks like this:

drawTextFromMacro(textMacroName, formatter.formatWholeDollar(data[col],true,false), startX, startY, startAmountRightX);

See what it’s doing? Except in this one case, we don’t want that. In this one case, we want something different… we want a decimal amount in the cell, for one call to addTableRow(), in a TableBuilder whose reason for being is to print whole dollars.

Existing Solutions

Myriad approaches exist.

  1. Change the TableBuilder to not expect the data to be raw, but to instead expect the data to be formatted. This is perhaps acceptable if the TableBuilder didn’t also want to perform additional calculations on the incoming data. And so if we make this change, we must also change the input data to have those calcs done ahead of time. Small inconvenience, perhaps, but it does start to make the method signatures on this component get really hairy
  2. Accept a formatter object as an arg to the function that does the drawing. This would work, too, since CF enables dynamic method calls via CFINVOKE. But in order to do this, I’d need to pass the formatter object, the function to be called, and the arguments to that function. So now my method signature just ballooned by 3. Blech.
  3. Stop being lame with this “Formatter” object business, and create a different formatter for all the possible conditions I might encounter: WholeDollarFormatter, DecimalFormatter, PercentFormatter, and on and on. Then, I just construct an appropriate instance inside my loop that’s adding data, based on whether I need the row to be WholeDollar or Decimal Formatted, and pass it as an arg. All of the OO smarties right about now – who’ve gotten this far without wanting to go kick their cats b/c I’m so stupid for not having gone this route from day 1 – will probably say this is a good option. Yeah, probably.
  4. Add other approaches here. The internets love to make suggestions. So pretend I thought of all your suggestions, and pretend they’re here, in bullet point #4. :-)
  5. If your mind went to “FormatterStrategyFactory”, you need to go get yourself a beer

Where My Head Is

Lately, I’ve been thinking that maybe the strict OO that I have come to love isn’t all that hot. No, it’s not because some douchebag wrote a blog post about how he loves procedural CF and he can’t “get” OO. It’s more about how perhaps OO has run its course, and perhaps since the world is going multi-core, and parallelism is just the way things are, and since “state” is the root of all evil with respect to parallel computing, that maybe OO isn’t the best model in the new world. Enter stage left, Functional Programming. I’ve been doing a lot of javascript programming lately, dabbling in Groovy, and reading Programming in Scala. So somewhere in my brain, I have this ADD-raddled set of neurons that can’t get a moment’s peace, and as I hit this dumbass dollar formatting problem – and all I want to do is get to lunch, get through the day, and go the hell home – this thought pops into my head:

closures_tweet

It kind of came unbidden. Up until that point, I had not done any serious investigation into closures, except what I knew about them from javascript. But it’s funny: in javascript, closures aren’t “a thing”. They are “the way things are”. It’s part of the language, like parentheses and colons and var and array.push(). If you’ve ever used jQuery or EXT, you’ve used closures and you maybe didn’t even know it.

All of this is to say that in my head, thanks to the fact that I can’t focus on one thing lately for more than a few hours, I had these other languages creeping in saying “If you were programming in ME, this is how we’d solve it and you’d be on your way out the door”.

I do not know if that is good or bad. Maybe it’s lazy? Maybe it’s impure?

But think about this:  Wouldn’t it be sweet, if in my current position, I could tell my function: “Hey, Function, when you format this data, I want you to do it this way”

And maybe it would look like this:

tableBuilder.addTableRow(
  data=someArray,
  blah=whatever,
  formatter="decimalFormatter"
);

And you’re thinking: “But dude, that looks exactly like what you described above, with the different Formatter classes, each with a single “format” method or whatever. But it’s not. Instead, formatter is a function, not an instance of a DecimalFormatter object I’ve declared elsewhere. That’s right, a function. And when tableBuilder does its drawText(), it does it like this:

drawTextFromMacro(textMacroName, formatter(data[col]), startX, startY, startAmountRightX);

 

What your CFML might look like if it had closures

You should be wondering: “How did he define this function? What does it look like?”. And here, in imaginary CF, is the answer:

decimalFormatter = function(value){Formatter.formatWholeDollar(value,true,false)};

tableBuilder.addTableRow(
  data=someArray,
  blah=whatever,
  formatter="decimalFormatter"
);

What’s happening in this imaginary example is that I’m passing a function I’ve defined in “ThingThatUsesTableBuilder”, named it “decimalFormatter”, and passed that function into the tableBuilder.addTableRow() method. TableBuilder then calls that function as if it were some function inside of itself. It looks completely natural.  But notice this: the call to the formatter(…) function has no idea about what that Formatter object is. It doesn’t know about its state. Hell, for all it knows, formatter() uses some other object to do its work. It doesn’t know, it doesn’t care. It just does it.

Another example: Maybe the call to addTableRow looks like this:

tableBuilder.addTableRow(
  data=someArray,
  blah=whatever,
  formatter="function(value){Formatter.formatWholeDollar(value,true,false)}"
);

See that: it’s defining the function on the fly. It uses the “guts” from that Formatter object that I discussed way back at the beginning, passing some additional params to control its behavior. Then, when drawTextFromMacro() gets that function as the formatter, it knows what the “Formatter” object is (this is the magic of closures), and it applies that formatWholeDollar() function to the data.

Which brings us to….

What is a Closure?

I will get this wrong. I will miss the nuance and subtlety and beauty and power. That’s why you go find the answers yourself. So I’m going to give you my definition, based on a lot of reading and experimentation:

Closures are function literals + party invites.

Let’s talk about literals. Literals are the things on the right. y = 5;  y is an int literal. you = “rad”;  “rad” is a String literal. MXUnitHombres = {frameworkJunkie=”Bill”, mockMaster=”Bob”, pluginDude=”Marc”, residentGrump=”Adam”, seleniumGuru=”MikeH”, cohibaSupplier=”MikeR”} is a Struct literal. 

decimalFormatter = function(value){…..} shows a function literal on the right.

tableBuilder.addTableRow(
  data=someArray,
  blah=whatever,
  formatter="function(value){Formatter.formatWholeDollar(value,true,false)}"
);

shows a function literal, passed as an argument to some other function.

Party Invites: When you define your function literal, that function might do simple things to the incoming arguments and not need the support of any additional functions/objects/variables. In that case, the thing you define really isn’t a closure. It’s just a function. But when you need your function to “remember” things about the world in which it was defined – in my case, I want my function to “remember” what the Formatter object was – that’s were we get into closures. In my example above, where I created the decimalFormatter function, its contents included a reference to the Formatter object I created earlier. When I pass my function into other objects and functions, it will still work – even though the recipient of my function has no notion of this “Formatter” object. What I’ve done is I’ve invited the Formatter object to the party. I’ve said “Hey, you, you come along with me”.

At risk of offending the purity gods, I’ll say this. Think of it like if your function literal "kept knowing about all the variables you invited into it even after that function was passed into a place that had no knowledge of those variables”. Ben Nadel probably does as good a job as I’ve seen of graphically describing this, in the context of javascript.

So for example, look at that code up above, where I’m doing formatter=”function(value){Formatter.formatWholeDollar…}” as an argument to addTableRow(). In the refactored example, now that TableBuilder is no longer responsible for formatting, it doesn’t know anything about this Formatter object. It’s been relieved of that responsibility. How is it supposed to know what it means when it gets a function that calls this “Formatter.formatWholeDollar” thing?  Well….

That’s closures. It knows about it because that’s how closures operate. They “enclose” or “close up” the stack around them and stuff em in a napsack and take it with them on their journey. They say to TableBuilder “tut tut, TableBuilder. Don’t fret. When you call me, I’m going to call this Formatter thing that I’ve got in me pocket, and ye shan’t worry, nor shall ye hurt him, because he’s mine and ye can’t keep him. So call me, I’ll do what I do, and go about yer business.”

Why is This Preferable?

In my case, this approach is preferable because it lets me very cleanly and easily encapsulate the behavior I want very close to the point at which I’ll use it. Immediately before I call the function that adds the row of data, I define the formatting it will use. Now… all the nastiness behind formatWholeDollar() is safely tucked away. We want that stuff out of the picture. We do want to control those last two args though (whatever they are) for the purpose of just this one row of data. So for this one row, we say “I want to call formatWholeDollar, with these two extra args”. And we do that by defining a function that does just that.

What’s it look like in real life?

As part of my foray into closures, I wanted to see what it’d feel like to do a rough approximation of my problem in some different languages. I chose the ones closest to me at the moment: javascript, groovy, and scala.

Here’s some javascript code that does kinda the same thing (stripped down for clarity). This is full HTML, so that muddies the picture somewhat, but I wanted to give you something you could copy/paste/run right in your browser.  It defines a Formatter object, an array, a function that does stuff with the array and takes a formatter function. Then it defines two functions (dollarAdder and zeroAdder). In the HTML, the onClick attributes call the doStuffWithArray function, passing in the function.

<html>

 <head>
 
 <script>
     //ignore... just for printing stuff to screen
  function addRow(data){
   var newline = document.createTextNode(data);
   document.getElementById("content").appendChild(newline);
   document.getElementById("content").appendChild(document.createElement('br'));
  }
 
 
  values = [1,2,3,4,5];

  Formatter = new Object();
  Formatter.formatWithOptions = function(value,addDollarSign,addTwoZeroes){
   if(addDollarSign) value = '$' + value;
   if(addTwoZeroes) value += '.00';
   return value;
  };

    
  function doStuffWithArray(array,formatter){
   addRow("Doing stuff with " + array);
   for(var el in array){
    addRow("Using item " + array[el] + " as " + formatter(array[el]) );
   }
  }
  
  dollarAdder = function(value){return Formatter.formatWithOptions(value,true)};
  zeroAdder = function(value){return Formatter.formatWithOptions(value,false,true)};
 </script>
 
 
 </head>
 
 <body>
 
  <input type="button" value="dollar adder" onclick="doStuffWithArray(values,dollarAdder);">
  <input type="button" value="zero adder" onclick="doStuffWithArray(values,zeroAdder);">
  <input type="button" value="inline dollar and zero adder" onclick="doStuffWithArray(values,function(value){return Formatter.formatWithOptions(value,true,true)});">
  
  <div id="content"></div>
 </body>

</html>

Run that, click the buttons, and you’ll see it in action.

Here’s some groovy:

package formatters;
class Formatter { 
 def formatWithOptions(Number value, boolean addDollarSign, boolean addTwoZeroes){
  def newValue = value.toString()
  if(addDollarSign) newValue = "\$" + newValue
  if(addTwoZeroes) newValue += ".00"
  return newValue
 } 
}
import formatters.Formatter

class Main {

 static main(args) {
  def formatter = new Formatter()
  def values = [1,2,3,4,5]
  def dollarFormatter = {formatter.formatWithOptions(it,true,false)}
  def zeroFormatter = {formatter.formatWithOptions(it,false,true)}
  
  def main = new Main()
  main.doStuffWithList(values,dollarFormatter)
  main.doStuffWithList(values,zeroFormatter)
  main.doStuffWithList(values,{formatter.formatWithOptions(it,true,true)})
 }
 
 def doStuffWithList(list,fn){
  println "Doing stuff with " + list  
  list.each {println "using item " + it + " as " + fn(it)}
  println "Doing other stuff..."
 }

}

Looks awfully similar to the javascript, doesn’t it? The main difference is groovy’s “it” keyword, as opposed to javascript where you can name your parameter whatever you like. The other difference is that I think this groovy example more clearly demonstrates the “Party Invite” concept. I’ve defined the “formatter” instance in the main method. I’ve defined the dollarFormatter and zeroFormatter function literals in main, using that formatter instance. I pass in those functions as arguments to another function (doStuffWithList), which lives in an instance of another object (because I’ve created a new “Main”), and that other function, in a new instance of Main, should by all reasoning have no notion of what “formatter.formatWithOptions….” means. But it does, because we’ve invited him to the party when we defined the function. This is why doStuffWithList doesn’t complain. In the javascript example, it very well might work because of the global nature of the “Formatter” I’ve declared. In this groovy example, it’s clearly shown that that is NOT happening.

 

And finally, some scala:

package formatters

object Formatter {

 def formatWithOptions(value:Number, addDollarSign:Boolean, addTwoZeroes:Boolean):String = {
   var newValue:String = value.toString()
   if(addDollarSign) newValue = "$" + newValue
   if(addTwoZeroes) newValue += ".00"
   return newValue
 }
}
package main

import formatters._;

object Main {
  def main(args : Array[String]) : Unit = {
    val values:List[Number] = List(1,2,3,4,5)
    println (Formatter.formatWithOptions(values(1),true,false))
    
    //formatter.formatWithOptions(value,true,false)
    doStuffWithList(values,dollarFormatter)
    doStuffWithList(values,zeroFormatter)
    doStuffWithList(values, Formatter.formatWithOptions(_,true,true)  );
    
  }
  def dollarFormatter = Formatter.formatWithOptions(_:Number,true,false)
  def zeroFormatter = Formatter.formatWithOptions(_:Number,false,true)
  
  def doStuffWithList(list:List[Number], fn:(Number) => String ) : Unit = {
    println("Doing stuff with " + list)
    for(item <- list) println ("using item " + item + " as " + fn(item))
    println("Doing other stuff")
  }
}

I’m least comfortable with Scala, so it’s most likely I’m A) doing it a little wrong, B) doing it all kinds of wrong, or C) doing all this in 20 lines when I could contrive some angle-bracket-dash-colon-underscore-ridden beast to do it in a single line of code. The two main differences here are that I had to define the formatter functions outside the place where I was using them (i.e. I couldn’t define the functions inline with the rest of the code, like I could with Groovy and Javascript, though that’s probably b/c I don’t know what I’m doing), and the use of “_” as the argument name. Think of it as like “it” for Groovy, but with symbols instead, because Scala is queer for symbols.

Final Thoughts

I included examples from other languages because I wanted to give you the feel of closures. I do not intend to come across as one of those people who wants things in CF just because the cool kids have them.  Still… I like closures, a lot, and I believe that if they were added to CFML, in time many programmers would look back on the days before closures and think “Remember them days, Buck?” “Yeah, Grady. Them days sucked.” Closures add a level of convenience that will be shown to exceed even the convenience of struct and array literals added in CF8.

The risk with closures is that it can invite a return to spaghetti, with large chunks of functionality stuck in-line. This is a Bad Thing(tm) and good programmers would be wise to avoid this trap. You’ll notice in my examples that I used closures more as a tidying-up. In fact, these examples show no original functionality. This is particularly important for testability. I trust that Formatter.formatWithOptions(value,true,false) and …(value,true,true) and …(value) all work correctly because they’ve all been unit tested. I do not care to test the closure I’ve defined because all it does is call functions on other objects. If CFML ever gets closures, you’ll start seeing all manner of bloggers railing against The Monster Closure. It will be good advice, so heed it.

I am most certainly not the first person to want Closures in CFML. One of my favorite CF dudes has been talking about it for a while, and Sean Corfield told me that he planned to formally submit a recommendation to the CFML Advisory Committee. CF makes things easier, and I’d like to hear your thoughts on how you might or might not use this feature were it added.

Finally, I welcome any comments, criticisms, dialog, corrections, clarifications, etc. This was a “journey” of sorts for me. I used it to learn a lot more than I knew before, but surely I’ve gotten things wrong and so I’ll gladly correct any misinformation.

 

--Marc

MXUnit Eclipse Plugin 1.2 Released

Friday, January 1, 2010

Happy New Year!

We’re happy to announce a new version of the MXUnit Eclipse plugin is available. The goal of this release is to simplify configuration for projects with respect to helping the plugin figure out the “cfc root” (i.e., turning c:\myproject\myapps\test\whatever\SomeTest.cfc into “myapps.test.whatever.SomeTest”). It’s always been a sore spot… this business of “What’s my webroot? Do I use the webroot or project properties? What if I have my project set up like this…?” 

With this release, the “webroot” stuff is gone. The CFC root is now configured only at the project level. For current projects that live at the webroot (like c:\inetpub\wwwroot\MyApp”), this should not cause you to reconfigure anything. For projects that live in other places, you might need to spend a minute configuring when you update the plugin. Of my dozen or so Eclipse projects, I had to update one. But my hope is that this change results in a much clearer configuration path for new users.

In addition, the Plugin’s built-in help has been expanded and reorganized. Now, when you click on the green “Help” (?) Icon on the plugin view, you get a much simplified view of the help, designed to take you to the two most commonly asked questions: How do I configure this thing to get the correct CFC path, and what’s this RemoteFacade URL all about? It looks like this:

mxunitplugin_help_1

If you want more, there’s a link there to take you to the full help. When you click that link, you get taken to the introduction page for the Help. That looks like this:

mxunitplugin_help_1_1

 

If you click the “Show in Contents” Button toward the top right of the view, you get the tree view of the help (This is the same view you’d get if you opened Help from “Help – Help Contents -- MXUnit”). If you expand the tree view, you’ll see the new organization of the Help content for the plugin. It looks thusly:

mxunitplugin_help_2

I added new documentation for configuring projects that need an Application.cfc, for projects that test CF9 ORM components, and all new documentation for the new way in which the CFC path for a test file is derived and configured. In addition, I added documentation on some things that MXUnit or the plugin doesn’t do but which weren’t clearly specified. Finally, I expanded the “Resources” section. Notice too that there are now links to “Troubleshooting”, “Frequently Asked Questions”, and “Tips and Tricks”, all from our new Wiki (created and hosted by MXUnit hombre Adam Haskell (Side note: if you have any thoughts on what content should be added into the wiki, or how it could be better organized, we want to hear it!). I wanted to put this documentation on the wiki, and not directly in the plugin, so that I could easily update it without having to push new releases of the plugin. Plus, this way, more people can contribute.

As always, if you have problems with or questions about the plugin, the fastest way to get answers is via the Google group.

Test and be Happy.

--Marc