Ayende @ Rahien

Refunds available at head office

Reduce ^ 2 in RavenDB

An interesting question that keeps popping up is how to re-reduce the results of a map/reduce. That is really nice feature on the surface, but it has a lot of implications, for example, when / how you run the 2nd reduce, can you chain only 1 time, or multiple times , what happens when there are a lot of reduce results, etc.

But most of the time, what people want is to be able to do aggregation on the map/reduce results without too much hassle, and they don’t have a lot of aggregated results or they are fine with waiting for them if they are very large. And we have a really nice solution for that scenario.

You start by defining the base map/reduce operation, like so:

image

Note that we need to also output the fields that we care about reducing further. In this case, we start by reducing to postal code, but we keep the city, region and country options as well.

Then, we define a transformer. Note that this is a special transformer, in that it has a group by in it, and it takes some parameters from outside.

image

Using those two together, we can now get the following results…

Raw map/reduce output:

image

With loc = City, we get:

image

With loc = Country, we get:

image

Tada, we have reduced further the result of a map/reduce operation. Now, this is subject to the usual limitations of RavenDB paging, in that it will only go through the only 1024 results. That can be a problem, but that is why RavenDB has the Streaming API.

You can use streaming on a map/reduce index with a transformer (and even apply parameters on top of that). That end up giving you the ability to run a re-reduction on top of a map/reduce index regardless of size.

Of course, on very large result sets, that can take quite a while, but that is expected and usually fine. For that matter, if you need to, you can chain the stream into a bulk insert, and get the re-reduction in that manner.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Federico Lois
03/28/2014 11:23 AM by
Federico Lois

What I would really would like to see in this particular scenario is the ability to have a way for complex transformations to be stored as usual indexes. We have background processes just to handle that, which is bad (we dislike them a lot). They are complex to manage, complex to upgrade and complex to get them right.

I believe we should probably look into bundles, but the deployment story gets more complex and with the usual amount of deployments that we do (more than 5 a week) it is a complexity we are not entirely sure we want to absorb it.

Ayende Rahien
03/28/2014 01:09 PM by
Ayende Rahien

Federico, Huh? Transformers are handled in the exact same manner that indexes are. Create them in your code, and then they ar automatically created via IndexCreationCreateIndexes

Judah Gabriel Himango
03/28/2014 09:20 PM by
Judah Gabriel Himango

Nifty!

And hey, that Silverlight Studio sure is looking nice these days. ;-)

Comments have been closed on this topic.