Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 5,953 | Comments: 44,409

filter by tags archive

Avoid where in a reduce clause

We got a customer question about a map/reduce index that produced the wrong results. The problem was a problem between the conceptual model and the actual model of how Map/Reduce actually works.

Let us take the following silly example. We want to find all the animal owner’s that have more than a single animal. We define an index like so:

// map
from a in docs.Animals
select new { a.Owner, Names = new[]{a.Name} }

// reduce
from r in results
group r by r.Owner into g
where g.Sum(x=>x.Names.Length) > 1
select new { Owner = g.Key, Names = g.SelectMany(x=>x.Names) }

And here is our input:

{ "Owner": "users/1", "Name": "Arava" }    // animals/1
{ "Owner": "users/1", "Name": "Oscar" }    // animals/2
{ "Owner": "users/1", "Name": "Phoebe" }   // animals/3

What would be the output of this index?

At first glance, you might guess that it would be:

{ "Owner": "users/1", "Names": ["Arava", "Oscar", "Phoebe" ] }

But you would be wrong. The actual output of this index… It is nothing. This index actually have no output.

But why?

To answer that, let us ask the following question. What would be the output for the following input?

{ "Owner": "users/1", "Name": "Arava" } // animals/1

That would be nothing, because it would be filtered by the where in the reduce clause. This is the underlying reasoning why this index has no output.

If we feed it the input one document at a time, it has no output. It is only if we give it all the data upfront that it has any output. But that isn’t how Map/Reduce works with RavenDB. Map/Reduce is incremental and recursive. Which means that we can (and do) run it on individual documents or blocks of documents independently. In order to ensure that, we actually always run the reduce function on the output of each individual document’s map result.

That, in turn, means that the index above has no output.

To write this index properly, I would have to do this:

// map
from a in docs.Animals
select new { a.Owner, Names = new[]{a.Name}, Count = 1 }

// reduce
from r in results
group r by r.Owner into g
select new { Owner = g.Key, Names = g.SelectMany(x=>x.Names), Count = g.Sum(x=>x.Count) }

And do the filter of Count > 1 in the query itself.


Derek den Haas

Why not disable the whole where clause in reduce?

I don't see any reason to filter things in reduce itself (what couldn't be filtered in map).

Ayende Rahien

Derek, Because there are actual real scenarios where you want to do that. We don't want to block them at this time.

Comment preview

Comments have been closed on this topic.


No future posts left, oh my!


  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats