“Look, Ma. No SQL!” – MongoDB and ColdFusion Part 3

Sunday, October 25, 2009

Addendum: The code examples below were built on earlier versions of MongoDB and will not work with 1.6+. Also, use Marc Esher's Github fork for the most current version of CFMongoDB.

Querying MongoDB documents with JavaScript and ColdFusion

In the first part of this series I presented my fundamental attraction to MongoDB, and in the last post I talked a bit about some of MongoDB's core concepts. In this part, I'll explore some approaches, challenges and a proposal for reading data stored in MongoDB.  The source code (still in a proof of concept state) is available at Github - feel free to fork it and make comments.

MongoDB's data is persisted as binary form of JSON (BSON) and the admin engine language is JavaScript. (We'll get to ColdFusion shortly) Let's work with the Blog document structure - JSON- presented in Part 1:

{ "_id" :  ObjectId( "65c5b8782745e04afc2abf00"), 
  "AUTHOR" : "bill_1" , 
  "TAGS" : ["Comics","Games","Python","NoSQL","Ruby"] , 
  "PUB_DATE" : "Thu Oct 22 2009 07:42:19 GMT-0400 (EDT)" 
  "BODY" : "Lorem ipsum dolor sit amet ..." , 
  "TS" : 1256177399955 , 
  "TITLE" : "Blog Title No.1 ..." 

The example code generates 1000 such documents. Imagine 1000 items with the author and title values incremented for each one from 0-999.

From MongoDB's admin console, you can query this structure like this:

> db.blog.find()

... which returns all the documents in the blog collection and is semantically equivalent to select * from blog

Here's some other basic examples for querying the blog collection. This should give you a feel for how to get data back from MongoDB:

Retrieve a blog entry by ID:
SQL: select * from blog where id = '65c5b8782745e04afc2abf00'
MongoDB: db.blog.find( {_id: ObjectId( "65c5b8782745e04afc2abf00")} )

Retrieve 10 of the most recent titles:
SQL: select top 10 * from blog order by pub_date desc
MongoDB: db.blog.find().sort( {PUB_DATE:-1} ).limit(10)

Retrieve 10 entries by a given author:
SQL: select top 10 * from blog where author = 'bill_1'
MongoDB: db.blog.find( {author: 'bill_1'} ).limit(10)

To retrieve data from MongoDB you use JavaScript and object notation. An interesting point is the syntax for search criteria. The find(...) method expects a JavaScript object with the criteria; e.g., {author: 'bill_1'}. For basic equality checks you pass in the name of the field you're searching and the value. To do comparative tests you can use the syntax:  db.collection.find( { "field" : { $gt: value } } ) for greater than. There's almost a full range of operators available, including full regular expression searches - Wahoo! Take a quick peak at the docs and come back ... http://www.mongodb.org/display/DOCS/Advanced+Queries . What do you think? That's the command line shell. Very cool. But how can we do that in a ColdFusion webapp? At the root, you wrap ColdFusion around the Java client:

//setup - no necessary for every search 
//create a mongo instance 
mongo = createObject('java', 'com.mongodb.Mongo').init( server_name , server_port ); 
//get a handle to the database 
db = mongo.getDb(db_name); 
//get or create a collection (e.g., Blog)   
collection = db.getCollection(collection_name); 

//build search criteria 
criteria = createObject('java', 'com.mongodb.BasicDBObject').init('TITLE','bill_1.*'); //regular expression 
//execute search 
results = collection.find(criteria); 

It's helpful to note that the primary interface for both writing data and querying data is com.mongodb.DBObject. This extends java.util.HashMap and represents a map of name value pairs that are saved to the datastore and also used as search criteria. In the example above note that you create a BasicDBObject with a name and value. This object is to the find method and used to query the datastor.

As for the syntax, I'd prefer something different. So, I here’s my attempt at a DSL in ColdFusion which looks like this:

results = mongo.collection('blog').startsWith('TITLE','Blog Title No.60').search();

This returns all documents whose title starts with the second argument. This is semantically the same as select * from blog where title LIKE ‘Blog Title No.60%'.

For other string related criteria there’s also endsWith(field,value), exists(field,value), regex(field,expression) and others. Since this is a DSL we can chain these together to better express a search:

results = mongo.collection(‘blog’).
                exists(‘BODY, ‘MongoDB’).
                eq(‘AUTHOR’, ‘bill’).

This is asking for all the blog entries by bill that were posted after September 12, 2009, with Mongo in the body. Now, I think this is clear, but maybe you don’t. That’s cool.  Here’s the proposed syntax. Again, this is all a proof of concept, so feel free to comment.

   Proposed DSL for MongoDB searches:  
  results =  mongo.collection('blog').   //optionally set the collection to search, otherwise it will use current collection 
             startsWith('name','foo').   //string 
             endsWith('title','bar').    //string 
             exists('field','value').    //string 
             regex('field','value').     //string 
             before('field', 'value').   //date 
             after('field', 'value').    //date 
             $eq('field','value').       //numeric 
             $gt('field','value').       //numeric 
             $gte('field','value').      //numeric 
             $lte('field','value').      //numeric 
             $in('field','value').       //array 
             $nin('field','value').      //array 
             search('title,author,date', limit); 

Proposed:    search( keys=list_of_keys_to_return, limit=num, start=num, sort={field=direction} );

The string and date searches all work as expected, but there are issues with the numerics and the conversion from ColdFusion to Java representations. Numeric comparisons need to support all numeric types. The numeric comparison methods are all prefixes with $. This is done to match the MongoDB JavaScript syntax and to get around ColdFusion's reserved keywords. The array searches have been proven, but need work.

One last goodie: $where(…) : You can also do adhoc queries with JavaScript like so:

results = mongo.collection(‘blog’).$where( ‘this.TITLE == ‘Blog Title No.1 || this.AUTHOR == ‘bill’ ’).search();

Important notes:
  • When using $where, you must prefix the item you are seaching for with 'this', which corresponds to the current collection.
  • MongoDB mentions that there is a greater perfomance demand for $where
  • Searches are case sensitive. This may throw some folks, but it might help to think in terms of JavaScript, which is case sensitive.
  • Case sensitivity applies to both keys and values. And note, too, that ColdFusion creates uppercase keys when creating structs, unless you create the struct using this syntax: my_struct["my_key"] = my_value;
  • There is no built in boolean OR, yet, except in $where

Feedback is welcome, of course.

Up Next: Storing Blobs, Indexes, Admin 101

Test and be Happy!


Marc Esher said...

bill, with respect to case sensitivity: can you get around it with the same "hack" you use in Oracle, i.e. where ucase(column) eq 'BILL' ?

bill shelton said...

marc, from the API perspective you could do some acrobatics to iterate over each key to make sure it's persisted in all lower or upper case. then when searching for the key you would need to use the same case. it could be a configuration item ... however, this would not work with the $where(...string...) search method unless you parse the string before sending it to Mongo. --bill

Dan G. Switzer, II said...

Since you're defining the API to simplify usage, I'd dropped the $ in the $gt, $lt, etc methods. I know this matches the names exactly in MangoDB, but dropping the $ seems more natural to CF and saves typing a difficult keystroke (since you have to press two keys to generate the $ character.)

Marc Esher said...

Dan, I believe bill did that since gt, eq, etc are all reserved in CF

Zac Spitzer / Fastmovingtarget said...

interesting stuff, call me crazy, but is there a good reason still not to perhaps use basic SQL syntax as the query language?

bill shelton said...

Zac, i think there "might" be a couple of issues with using a SQL dialect, assuming the presence of a SQL->JavaScript translator. The biggest one i see at the moment is that it possibly implies that the underlying data can be treated as relational. Developers may then expect they can do certain "basic" operations on the data which may or may not be possible or reasonable; e.g., joins. For Bigtable Google implemented GQL, which is a small subset of SQL. I think this is Google's attempt at easing developers into that world, and it certainly sounds reasonable. however, I still think the biggest hurdle for most folks is thinking about the data in a non-relational yet structured way. I wonder if having a SQL dialect perpetuates the thinking rather than guiding towards a different data storage mechanism? -bill