A Generic ColdFusion Struct Visitor

Friday, July 10, 2009

Warning: No unit tests accompany this code.

Recently, I mentioned on Twitter that I wish CF had a regex version of StructFindValue, and this started a conversation between me, Ben Nadel, and Nathan Mische. Ben and Nathan, both of whom I swear are code factories, had code out there within hours to do what I was looking for. Bless their hearts. While I normally try to hold to the “Twitter is not a Chat” maxim, in this case, I ran with it and it yielded fruit.

During our conversation, what I came to realize was that I didn’t want just a reStructFindValue, what I wanted was a generic means of looping over structures and then “doing stuff” with each key that it encountered, as it walked recursively through a structure, no matter how deep/nested.

Some of you may recognize this as the Visitor pattern. I tweeted thusly:

@bennadel what ColdFusion really needs is a StructVisitor. With that, a man could do great things.

2:12 PM Jul 7th from TweetDeck in reply to bennadel

Inspired by our conversation, and by the quick contributions of these two fine gents, I decided to take a quick stab at a basic implementation of some code, which I’ll call “StructEach.cfc” (Nathan called his version StructEach, and I liked it). It should behave like so:

  • Take a Struct and a Visitor object as arguments to an “each()” function
  • Loop over all keys in the input Struct, recursing as it went
  • For all keys in the struct, pass the key and struct value into the “visit()” function of the Visitor object
  • The visit function of the Visitor must return a boolean indicating whether the code should continue looping over the struct
  • The visit function of the Visitor can do whatever the hell it wants to do.

Before showing the code for StructEach.cfc, let’s look at why you’d want to use something like this, and then how you would use it.

Why? What are the use cases for this?

  • Perhaps you’d like to take a structure and convert it to XML
  • Perhaps you’d like to do really weird or granular searching of the stuff inside the structure, even if the structure has arrays in it, queries, etc.
  • Perhaps you’d like to “dump” the values of a struct, but you want them in textareas b/c you need to see exactly what the strings look like, whitespace and all
  • Perhaps you’d like to do a deep comparison of two structures and report the differences in a way that’s easy to read

That last one, by the way, is my own personal use case for MXUnit.

What Kind of Structures should it take?

For me, I wanted this thing to be able to handle nastiness, not just simple structures. At the very least, it should take a struct that was this ugly:

startstruct

How do you use it?

First off, you’d write a CFC that implemented the “visit” function. Here’s a really simple visitor that just outputs anything passed into it, along with the full path to the value’s key in the structure

OutputVisitor.cfc

<cfcomponent hint="" output="false">
        <cffunction name="visit" output="true" access="public" returntype="boolean" hint="">
                <cfargument name="key" type="string" required="true"/>
                <cfargument name="value" type="any" required="true"/>
                
                <cfif isSimpleValue(value)>
                        <cfoutput>#key# = #value# <br></cfoutput>
                <cfelse>
                        <cfdump var="#value#" label="#key#" expand="true">
                </cfif>
                
                <cfreturn true>
        </cffunction>
</cfcomponent>

You’d then use this visitor like so:

<cfset outputVisitor = createObject("component","OutputVisitor")>
<cfset runner = createObject("component","StructEach")>
<cfset runner.each(struct,outputVisitor)>

In the case of the sample struct shown above, you’d get this output:

outputvisitor

Where it gets interesting

That’s not very useful, is it? But it demonstrates how you’d use it. What you can trust in your visitor is that for every key in the structure, your visit function will get called. That’s the sweetness of this approach… you’re free to do whatever you wish with the data coming in.

This gets interesting when you start to consider how this is all implemented: you’re passing an object into the StructEach… think about that. Objects can contain state. This means that you can “remember” things through the iterations of the struct, as your visit function is repeatedly hit.

Let’s look at an example that implements “regex find” functionality over a structure. My implementation will

  • provide an init() function for taking in searchString and scope arguments
  • return a struct containing keys “found”, “path”, and “value” as soon as it finds something that matches the regular expression, and it will return false so that the StructEach will stop searching. Consider this a findOneOf() >
  • return an array containing multiple structs if scope is set to “all”. Consider this a findAll() >

You’d call it like this:

<!--- a RegExFindVisitor that searches for 'even'; as soon as one is found, it'll stop searching --->
<cfset regexFindVisitor = createObject("component","RegExFindVisitor").init("even","one")>
<!--- a RegExFindVisitor that searches for 'even' or 'ix'; it will search every key in the struct --->
<cfset regexFindAllVisitor = createObject("component","RegExFindVisitor").init("even|ix","all")>

<cfset runner.each(struct,regexFindVisitor)>
<cfset runner.each(struct,regexFindAllVisitor)>

<cfdump var="#regexFindVisitor.getFound()#" label="regexFindVisitor">
<cfdump var="#regexFindAllVisitor.getFound()#" label="regexFindAllVisitor">

And here’s what you’d see:

regexfindvisitors

This is all made possible because the Visitor object can provide any other functions that it wants to… as long as it has a “visit()”, nothing else matters. In addition, the object can retain state. Thus, as visit is called, it can store stuff inside of itself for later computation and retrieval.

RegExFindVisitor.cfc

<cfcomponent hint="" output="false">      
        
        <cffunction name="init" output="false" access="public" returntype="any" hint="">
                <cfargument name="searchString" type="string" required="true"/>
                <cfargument name="scope" type="string" required="false" default="one" hint="'one' or 'all'. If scope is 'all', returns an array; otherwise, returns a struct"/>
                <cfset StructAppend(variables,arguments)>
                <cfset variables.instance.found = []>
                <cfreturn this>
        </cffunction>
        
        <cffunction name="visit" output="true" access="public" returntype="boolean" hint="">
                <cfargument name="key" type="string" required="true"/>
                <cfargument name="value" type="any" required="true"/>
                <cfset var tmp = "">
                
                <cfif isSimpleValue(value) && reFind(variables.searchString,value)>
                        <cfset tmp = {path=key,value=value,found="true"}>
                        <cfset ArrayAppend(variables.instance.found , tmp)>
                        <cfif variables.scope eq "one">
                                <cfset variables.instance.found = variables.instance.found[1]>
                                <cfreturn false>                  
                        </cfif>
                </cfif>
                
                <cfreturn true>
        </cffunction>
        
        <cffunction name="getFound" returntype="any">
                <cfreturn variables.instance.found>
        </cffunction>
</cfcomponent>

Note that our source structure contains keys that have arrays and queries in them, but we’re only concerned with the keys that have simple values (and any nested structs that have simple values). Consequently, I’ve wrapped the chunk of code that does the work in an “isSimpleValue()” check.

But imagine… what if you wanted to also search through the arrays or queries that are passed into this thing? You could do that by implementing some additional searches inside of here.

Here’s another one I whipped up: Imagine you had a big ugly struct with various and sundry data in it, and you wanted to get just the queries.

Here's how you'd call it:

<cfset queryHarvester = createObject("component","QueryHarvester")>
<cfset runner.each(struct,queryHarvester)>
<cfdump var="#queryHarvester.getQueries()#">

And here's its implementation:

QueryHarvester.cfc

<cfcomponent hint="" output="false">
        <cfset queries = {}>
        <cffunction name="visit" output="true" access="public" returntype="boolean" hint="">
                <cfargument name="key" type="string" required="true"/>
                <cfargument name="value" type="any" required="true"/>
                
                <cfif isQuery(value)>
                        <cfset queries[key] = value>                      
                </cfif>
                
                <cfreturn true>
        </cffunction>
        
        <cffunction name="getQueries" returntype="struct">
                <cfreturn queries>
        </cffunction>
        
</cfcomponent>

And here’s what you’d get

queryharvestingvisitor

How does it work?

This is not sexy code, and it’s certainly not a) optimized or b) complete (for example, I’d like it to loop through arrays of structs as well). But it’s a start, and it’s very simple:

StructEach.cfc

<cfcomponent>
        <cffunction name="each" output="true" access="public" returntype="any" hint="">
                <cfargument name="struct" type="struct" required="true"/>
                <cfargument name="visitor" type="any" required="true"/>
                <cfargument name="path" type="string" required="false" default="" hint="don't touch this, suckers">
                <cfscript>
                var loopkey = "";
                var thisPath = arguments.path;
                        
                for(loopkey in struct){
                        thisPath &= "[ ""#loopkey#"" ]";
                        if(isStruct(struct[loopkey])){
                                each(struct[loopkey],visitor,thisPath);
                        }else if(!visitor.visit(thisPath,struct[loopkey])){
                                return;
                        }                       
                        thisPath = arguments.path;
                }
                </cfscript>
        </cffunction>
</cfcomponent>

Final Thoughts

This code, and even this approach, is purposely simple. There is nothing clever about it. I believe the value in an approach like this is in thinking about tasks in a more generic sense. Think of all the code out there in the world that does this: loops over keys in a structure recursively, and does stuff with the values it encounters. I imagine if you compared all of this code, it’d look remarkably similar in its recursion, but the details in the “else if” clause would be different.

Thus, in this approach, we’re extracting the common stuff into StructEach and providing hooks for implementations to get plugged into that “else if” part, in the form of a Visitor. Code gets much simpler and easier to understand; and code gets much easier to unit test because you’d be testing just your “visit” function, not the looping/recursion.

The downside is where you have recursion conditions that themselves require state external to the visitor; in this case, the StructEach approach wouldn’t work well, and there’s danger in trying to shoehorn an approach just to avoid a few lines of code. So, as with most things OO, you use this approach when it’s warranted, not just when it’s easy.

When you start thinking in more generic terms, you start to see opportunities for code reuse. For example, how much code is out there in the world, do you think, that loops over directories and "does stuff" to the directories and files in those directories? Imagine the dozen or so lines of boilerplate recursion code required to do the looping part, and the one line of code to delete the file. Multiply that by a gazillion. How much code in your own codebase does this looping, with the only difference being "what I do when I hit each file"?

If you think generically, you see there's a better way to do this: you create a DirectoryWalker type of deal that handles the looping, and then you provide a FileVisitor type of implementation that acts on each file it hits. So if you have code that deletes certain files in directories, or "touches" them to make them current, or archives certain files, or makes jpg thumbnails of pdfs that don’t already have thumbnails... you can separate that out into really small, easy-to-test Visitor implementations and just use DirectoryWalker to handle the boilerplate.

Consider the unit test for that last example, the one that generates thumbs. If you were to write a “generateThumbsForDirectory"()” function, that’s hard to test! Seriously… think about what would be required in a unit test to test it. You’re testing two things: 1) your recursion and 2) your thumbnail generation. But the recursion is boilerplate, and testing boilerplate sucks. What you really want to test is your thumb generation. So if you extract that out into a simple function that takes an existing File path, your job is simple: look for an existing thumb for that file, and if one doesn’t exist, create it. That’s a really small unit test comprising two functions: makeThumbnail_should_GenerateThumbnails_When_ThumbnailDoesNotExist() and makethumbnail_shouldNot_GenerateThumbnails_When_ThumbnailDoesExist(). You could write that in maybe 15 lines of code. You trust that the StructEach works because it’s been tested separately, and you can now trust your code because you’ve proven, with your tests, that your code works.

Fewer lines of code + easier-to-test functions = fewer bugs.

7 comments:

Ben Nadel said...

Looks cool. I like the stressing of thinking generically. Excellent point.

Marc Esher said...

Ben, it's extremely similar to your custom tag solution. the only real differences are that it's component-based and that it recurses. But conceptually, they're quite similar.

What I'm trying to get better at when approaching a given problem is trying to see the solution in two ways: solving the specific problem, but also getting at the pattern of the problem, to see if I can solve that problem in such a way that I set myself up for an easier time of solving other problems in the future.

Ben Nadel said...

Most agreed. As you say, how many lines of code are we writing out there that essentially does the same exact thing! Seeing the pattern is what helps us cut down on redundancy.

nmische said...

My thinking on this was very much influenced by the closure based each loops in ruby/groovy, which is why I was using a function in my approach. This definitely has limitations though, and I think your approach of using a an object as the visitor gives a lot more flexibility. Very nice!

John Allen said...

Hot

billy said...

marc, great stuff and right on time! i was just looking into a way to recurse over the mxunit debug array to dump it to plain text. you took care of that and then some ... awesome!

dr-nick said...

Just found this iterator solution via google. Perfect for a problem I'm working on.

I know it's only sample code, but perhaps worth a mention that ReFindVisitor.getFound() returns an empty array when nothing matches, rather than an empty struct. Easy to fix in a couple of lines (I've got code if you want me to email).