Simple Geospatial Queries with MongoDB

Tuesday, November 23, 2010

MongoDB has supported geospatial queries for a while, and in the upcoming 1.7 release it’ll get even better. Let’s take a look at how easy it is to query MongoDB in an idiomatic geospatial manner.

First, as described in the docs, you’ll need your data structured a certain way, and then you’ll need to add a 2d index. In my example, I’ll have documents that contain non-geospatial data (names, dates, other stuff) and then an embedded “LOC” document which contains “LAT” and “LONG” fields. These are arbitrary names… what’s important is that you get the order right. First, your “location” field will contain 2 values. You can do this in two ways:

  1. array
  2. nested document with keys

For example:

{  LOC: [38, –102] } is a valid “location” field because it contains an array with 2 values

{ LOC: { LAT: 38, LONG: -102 } } is a valid location field because it contains a nested document with two keys, and those keys contain numeric values. 

Note that the order is important! You’ll insert the data in the order in which you’d query the data. In general, stick with lat/long or x/y order.

Sample data

geodata

As you can see, this “treatment unit” is positioned at roughly [38,-102]

Adding the 2d Index

Let’s add an index:

> db.treatmentunits.ensureIndex( { LOC: “2d” } )

Cool, that was easy.

Querying with geospatial operators

Exact matches are rarely useful unless, and become even more so the more granular your lat/long storage becomes. You’ll notice in this example that I have ridiculously precise location data (it’s precise b/c it’s fake). Still, if you want to query for exact matches, you’d do this:

> db.treatmentunits.find( { LOC: [38,-102] } )

Due to the precision of my data, this will yield no results. So let’s widen the net by using the “$near” operator:

> db.treatmentunits.find( { LOC: { $near: [38,-102] } } )

That’s better… lots of results. By default, Mongo will give you the closest 100 results. You probably don’t want that. So, let’s tighten it up by setting some “bounds”, using the $maxDistance operator:

> db.treatmentunits.find( { LOC: { $near: [38,-102], $maxDistance: 5 } } )

You might be thinking: how’s that different from using limit(x)? Simple: limit() is merely a limit on the number of rows returned. If you want only 5, you get only 5. But by using $maxDistance, you’re not specifying how many results you want but rather how close the locations must be to your target location in order to be included in the results. If you want the closest 10 locations that meet a $maxDistance of 5, that’s where you’d use limit:

> db.treatmentunits.find( { LOC: { $near: [38,-102], $maxDistance: 5 } } ).limit(10)

Now, picture a map in your head, and draw a box somewhere on that map. Let’s say you want to find results whose location is within that bounding box. You’ll use the $within and $box operators, like so:

> db.treatmentunits.find( { LOC: { $within : { $box : [[40,-120], [48,-108]] } } }  )

Now that’s a big box, comprising nearly the entire southwestern US, so it’s going to return a lot of records. Tighten up the box to limit your results.

Finally, to find records whose location starts at a center point and radiates out, you’ll use the $within and $center operators:

> center = [38, -102]
> radius = 10
> db.treatmentunits.find({ LOC: { $within: { $center: [center, radius] } } } )

You may be wondering at this point, “Why don’t you have any sort() applied?”. The answer is: MongoDB will provide the correct sort for you when using geospatial queries

6 comments:

Drew Noakes said...

What are the units of 'radius'? Are they in lat/lng units, or something like kilometers? If it's the former, then the catchment area becomes increasingly ovular as you near the poles.

bill shelton said...

Sweet, Marc. Will this be wired into CFMongoDB?

bill

Marc Esher said...

Drew, all units are in radians. to your point about becoming increasingly ovular, the MongoDB docs for geospatial indexing indicate that a "spherical" model is coming in 1.7, which I believe addresses your concern. Though I'm admittedly just getting to know GIS/GEO and so I of course could be wrong.

Bill, yeah, we'll add it in there. Basically it'll just be some functions in SearchBuilder that make constructing those structs simpler, something like

mongo.query(coll).near(field="LOC", x=50, y=32, maxDistance=4, query={type='museum'}).search()

I'm still not sure yet, as the geoNear command is appealing b/c it also returns the distance between the target location and the result, for each result. we'll see. any thoughts?

Henry Ho said...

Is it difficult to use MongoDB w/o CFMongoDB?

Zoramite said...

@henry It's not too difficult to use, and you can actually get the native java objects that cfmongodb is using if you want to work directly with them.

The 'hard' part is converting some of the coldfusion data types. Definately check out the codebase to see how it's being done.

Marc Esher said...

Henry, it's not *hard* to use MongoDB and CF with just the java drivers. Mostly, it's merely annoying. You end up doing a LOT of javacasting when you use the Java objects directly, and you end up forgetting to javacast, and that bites you. You do a LOT of createObject calls, and your code ends up not all that attractive. Eventually, you'll write a set of functions that create those objects for you and make it a bit more readable. Then you'll write code to pass all your structs through a function that auto-converts all the datatypes so you quit javacasting. Then you write some generic functions to make searching look nicer in your code. And before you know it, you have a small framework, and chances are it'd look a heck of a lot like cfmongodb.