Reduce ^ 2 in RavenDB

Mar 28 2014

Reduce ^ 2 in RavenDB

time to read 3 min | 487 words

An interesting question that keeps popping up is how to re-reduce the results of a map/reduce. That is really nice feature on the surface, but it has a lot of implications, for example, when / how you run the 2nd reduce, can you chain only 1 time, or multiple times , what happens when there are a lot of reduce results, etc.

But most of the time, what people want is to be able to do aggregation on the map/reduce results without too much hassle, and they don’t have a lot of aggregated results or they are fine with waiting for them if they are very large. And we have a really nice solution for that scenario.

You start by defining the base map/reduce operation, like so:

Note that we need to also output the fields that we care about reducing further. In this case, we start by reducing to postal code, but we keep the city, region and country options as well.

Then, we define a transformer. Note that this is a special transformer, in that it has a group by in it, and it takes some parameters from outside.

Using those two together, we can now get the following results…

Raw map/reduce output:

With loc = City, we get:

With loc = Country, we get:

Tada, we have reduced further the result of a map/reduce operation. Now, this is subject to the usual limitations of RavenDB paging, in that it will only go through the only 1024 results. That can be a problem, but that is why RavenDB has the Streaming API.

You can use streaming on a map/reduce index with a transformer (and even apply parameters on top of that). That end up giving you the ability to run a re-reduction on top of a map/reduce index regardless of size.

Of course, on very large result sets, that can take quite a while, but that is expected and usually fine. For that matter, if you need to, you can chain the stream into a bulk insert, and get the re-reduction in that manner.

Tweet Share Share 4 comments

Tags:

raven

Comments

27 Mar 2014
14:24 PM

Kijana Woodard

:-]

28 Mar 2014
11:23 AM

Federico Lois

What I would really would like to see in this particular scenario is the ability to have a way for complex transformations to be stored as usual indexes. We have background processes just to handle that, which is bad (we dislike them a lot). They are complex to manage, complex to upgrade and complex to get them right.

I believe we should probably look into bundles, but the deployment story gets more complex and with the usual amount of deployments that we do (more than 5 a week) it is a complexity we are not entirely sure we want to absorb it.

28 Mar 2014
13:09 PM

Ayende Rahien

Federico, Huh? Transformers are handled in the exact same manner that indexes are. Create them in your code, and then they ar automatically created via IndexCreationCreateIndexes

28 Mar 2014
21:20 PM

Judah Gabriel Himango

Nifty!

And hey, that Silverlight Studio sure is looking nice these days. ;-)

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB