"Look, Ma. No SQL!" - MongoDB and ColdFusion - Part 2

Tuesday, October 20, 2009

Addendum: The code examples below were built on earlier versions of MongoDB and will not work with 1.6+. Also, use Marc Esher's Github fork for the most current version of CFMongoDB.

MongoDB Core Concepts

In the last post, I started to scratch the surface of MongoDB and focused on some of its efficiencies. I mentioned that querying would be next, but I think it might be better to briefly discuss some of MongoDB's core concepts first. Note that some of these will also apply to other schemaless datastores, like CouchDB and Google's BigTable. 

The biggest challenge getting your head around a schemaless datastore is shifting in how you think about data persistence. If you're like me, you've spent most of your professional career thinking in relational terms, and maybe have even formally studied set theory, and know what tuples are and who E.F. Codd is. The concepts of tables, rows, relationships, normalization, and structured query language (SQL) might be so ingrained, that it's truly difficult to think of data storage any other way. If you want what you got, don't change what you do. But, if you sense that there might be a better way to store and retrieve some kinds of data, you're certainly not alone.

Many of the schemaless/schema-free terms and concepts are either inspired by, or are borrowed from the relational model, yet there are fundamental differences - I'll touch on a couple of the differences later. For now, let's look at some of MongoDB's concepts and how they might relate to a relational mindset.

  • Database - Like in a relational model, this is a logical and physical storage area for data, which are Collections of Documents. A MongoDB server can have multiple databases. 
  • Collection - Similar to a table, a Collection is a group of Documents. 
  • Document - Similar to a row in a table, a Document represents an instance of some object in a Collection. In CFML, a Document object is a structure (or an array). A Document can contain other documents by either directly embedding them (see below) or by reference. 
  • Indexes - Like a relational database index, this is a Document attribute registered with MongoDB for query performance and query optimization. 
  • Queries and Cursors - Lastly, like a relational database, you query Collections. The query results are common programming data structures: Iterators or Arrays, but are referred to more generally as Cursors.

Example CFML/Java Implementation:

 person = {
  name = 'bill',
  profession = 'programmer',
  age = 'older than dirt'

 mongo = createObject('java', 'com.mongodb.Mongo').init();

 db = mongo.getDb('my_db'); 
 collection = db.getCollection('people');
 doc = createObject('java', 'com.mongodb.BasicDBObject').init();
 collection.insert(doc); //correction! thanks to marc e.
 criteria = createObject('java', 'com.mongodb.BasicDBObject').init('NAME','bill');
 query = collection.find(criteria);
  item = query.next();
  dump( item );

Note that the "get" operations above will create the object if it does not exist. This a true example of schema-freeness

Embedded Document Example:
This is much simpler than the name might imply. All you need to do nest one struct within another:

person = {
 name = 'bill',
 profession = 'programmer',
 age = 'older than dirt',
 address ={
   street = 'main st.',
   city = 'anytown',
   planet = {
     name = 'Earth',
     planetary_code = '0xFFB3540A'

If you've worked with XML, the nested document structure should be familiar. With MongoDB, Document objects can nest an arbitrary number of other documents. Check out the Schema Design guidelines for some ideas.

Key Differences:
From a developer's perspective, the key differences are in how you think about the data and how the data is retrieved. Instead of thinking about your data as a set of related 2-dimensional tables with rows and cells, you are free to think about data in terms of object structure or composition (hence the name Document). Storing the data is pretty trivial in MongoDB, and quite a relief compared to writing a complex insert statement. Retrieving data, though, is where stumbling might occur, and big picture consideration should be given when embarking on a new project and considering RDBMS or schemaless. Some people have recommended that the general guideline for deciding whether or not to use a schemaless datastore is to know whether or not your data is high volume and low value. In other words, if you have very complex data, such as financial data that needs to be frequently aggregated at a very atomic level (high value), or if your data not so complex and maybe, too, you have a lot of it (low value,high volume) e.g., a Blog. In the case of the former, maybe a relational database might be better; in the latter, maybe MongoDB.

As for querying the data, this is what you might be used to:

select  p.*, a.*
 from Person p inner join Address a on p.id = a.pid
 where p.id = 'some_id'

In MongoDB, given the person Document above, this would be done more like this:

person = db.people.find( {_id: some_id} ).limit(1);

Both say the same thing, semantically: "Given a person's ID, return the person and their address"

Up next, for real, Querying MongoDB in detail.


Raj said...

Nice post Bill. Is there any limitation like app engine datastore while retrieving the data (max 1000 rows at a time)

billy said...

Raj, I haven't seen any limitation imposed by MongoDB like in AppEngine. So, I assume you could return as many objects as you wish. But, just as with /any/ data being pushed through the wire, we probably should be very cognisant of the amount of data and the impact.


Sami Hoda said...


Keep up these posts. I'm glued!

Marc Esher said...

hey bill, i gave your code a shot and it wasn't working. turns out, i needed collection.save(doc).

also, before i start looking in the docs to begin the "real learning", how do you associate an "ID" with a document? For example, if I run that example you posted 3 times, I get 3 different inserts of the same data. How do I say "this person is ID X"?

bill shelton said...

marc, you must mean the code snippet on this page ... yes, it's missing the insert statement (or save). Doh! You might want to peep the MongoDB.cfc file on GitHub. That's what I've been using to define the coldFusion API interface.

As for retrieving an object by ID, MongoDB creates an "_id" object when an insert occurs. In the MongoDB.cfc there's a put() method that adds this _id to the struct AND returns the string id which can then be used later on to get the document back. This is the recommended way to retrieve an object as it's very efficient. Below are the put() and getById() methods. They're pretty short and sweet :-)

function put(o){
var doc = createObject('java', 'com.mongodb.BasicDBObject').init();
var id = chr(0);
id = collection.insert(doc).get("_id");
o._id = id; //add the _id object to the struct
return id;
}//end function

function getById(id){
var objId = createObject('java','com.mongodb.ObjectId').init(id);
return get("_id", objId);
} //


Marc Esher said...

Awesome, Bill. I'm reading docs now and it's all coming together.

I'll start using the CFC wrapper after I get really comfortable doing it the hard way. I need to understand all the moving parts before I can really a) appreciate an abstraction and b) contribute to an abstraction.

though I can tell right away that I want to suggest or submit an $if() filter to the DSL:


I'd hate to have to break out of a nice chained criteria just to add conditional criteria, which is why an if() is appealing to me. But that's for later.

First, more learning. Great stuff man. It is definitely true that you have to quit thinking relationally, which is very difficult.

bill shelton said...

marc, at first glance i like the $if()! i also felt i had to look at both the JavaScript syntax and the Java client API to try to gleam where the designers are coming from. then, move on to how "i" would use it.

it _is_ funny how your thinking gets challenged isn't it? it's like breaking a habit.