MongoDB Counting and Grouping

Count

count() simply returns the number of documents in a collection.

Suppose we have a collection where each document represents a pageview. We can get the total number of pageviews like so:

// Javascript
db.pageviews.count()

# Ruby
@db['pageviews'].count

Admittedly more useful would be the number of pageviews from a given month:

// Javascript
db.pageviews.count({date: 
{'$gte': new Date("Nov 1, 2009"), '$lt': new Date("Dec 1, 2009")}})

# Ruby
@db['pageviews'].find({:date => 
{'$gte' => Time.utc(2009, 11), '$lt' => Time.utc(2009, 12)}}).count

Notice that we’ve passed a document selector to the count() method. This ensures that only documents matching the selector are included in the total.

Distinct

Another simple but useful aggregator is distinct(), which returns an array of unique values for a given key in a collection. We can specify a root-level key or a nested key. Here, we request unique ip-addresses and user agents:

// Javascript
db.pageviews.distinct('ip-address')
db.pageviews.distinct({'user.agent'})

# Ruby
@db['pageviews'].distinct('ip-address')
@db['pageviews'].distinct('user.agent')

Group

MongoDB’s group() command provides some of the same functionality as SQL’s GROUP BY, although the syntax is rather different. Like most database commands, group() takes a document whose keys designate the parameters of the operation:

// Group by user-agent, returning a sum for each user-agent.
db.pageviews.group(
{
	key: {'user.agent': true}, 
	initial: {sum: 0}, 
	reduce: function(doc, prev) { prev.sum += 1}
});

// Returns (Note: simplified)
[
	{
		'user.agent': 'Mozilla 5.0 / Gecko'
		'sum': 241
	},
	{
		'user.agent': 'Mozilla 5.0 / Webkit' 
		'sum': 79
	}
]

Looking at the command itself, the first parameter is clear enough: key specifies which key or keys to group by.

The next two parameters, initial and reduce, may be less familiar. Their use is analogous to most programming language implementations of inject.

With initial, we provide a base, aggregator document for each grouped result. Here, we’re just saying that the base document should contain one key, sum, having a value of 0.

// Initial, aggregator document
{sum: 0}

The reduce function take two parameters: 1) the current document being processed, and 2) the document we’re aggregating over (which starts out as the initial document described above).

// Reduce function
function(doc, prev) {
	prev.sum += 1;
}

In this simple case, we don’t even use the doc parameter; we just increment our initial document’s sum key. Since a new document is used for each grouping, the group() function returns all those documents as an array, adding any keys we’re grouping by, so that we’re given a result like this:

// Result (Note: simplified)
[
	{
		'user.agent': 'Mozilla 5.0 / Gecko'
		'sum': 241
	},
	{
		'user.agent': 'Mozilla 5.0 / Webkit' 
		'sum': 79
	}
]

The command we’ve been describing could be run from Ruby as follows:

# Group from Ruby
@db['pageviews'].group(['user.agent'], nil, {'sum' => 0},
	"function(doc, prev) { prev.sum += 1}")

With the second parameter, here nil, we can provide a query selector, so that group() will only operate over a certain subset of the collection. In the next installment, we explore that along with some of @group()@’s more advanced features.