Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,972 | Comments: 44,508

filter by tags archive

Reduce ^ 2 in RavenDB


An interesting question that keeps popping up is how to re-reduce the results of a map/reduce. That is really nice feature on the surface, but it has a lot of implications, for example, when / how you run the 2nd reduce, can you chain only 1 time, or multiple times , what happens when there are a lot of reduce results, etc.

But most of the time, what people want is to be able to do aggregation on the map/reduce results without too much hassle, and they don’t have a lot of aggregated results or they are fine with waiting for them if they are very large. And we have a really nice solution for that scenario.

You start by defining the base map/reduce operation, like so:

image

Note that we need to also output the fields that we care about reducing further. In this case, we start by reducing to postal code, but we keep the city, region and country options as well.

Then, we define a transformer. Note that this is a special transformer, in that it has a group by in it, and it takes some parameters from outside.

image

Using those two together, we can now get the following results…

Raw map/reduce output:

image

With loc = City, we get:

image

With loc = Country, we get:

image

Tada, we have reduced further the result of a map/reduce operation. Now, this is subject to the usual limitations of RavenDB paging, in that it will only go through the only 1024 results. That can be a problem, but that is why RavenDB has the Streaming API.

You can use streaming on a map/reduce index with a transformer (and even apply parameters on top of that). That end up giving you the ability to run a re-reduction on top of a map/reduce index regardless of size.

Of course, on very large result sets, that can take quite a while, but that is expected and usually fine. For that matter, if you need to, you can chain the stream into a bulk insert, and get the re-reduction in that manner.


Comments

Federico Lois

What I would really would like to see in this particular scenario is the ability to have a way for complex transformations to be stored as usual indexes. We have background processes just to handle that, which is bad (we dislike them a lot). They are complex to manage, complex to upgrade and complex to get them right.

I believe we should probably look into bundles, but the deployment story gets more complex and with the usual amount of deployments that we do (more than 5 a week) it is a complexity we are not entirely sure we want to absorb it.

Ayende Rahien

Federico, Huh? Transformers are handled in the exact same manner that indexes are. Create them in your code, and then they ar automatically created via IndexCreationCreateIndexes

Judah Gabriel Himango

Nifty!

And hey, that Silverlight Studio sure is looking nice these days. ;-)

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Paying the rent online - 15 hours from now
  2. Reducing parsing costs in RavenDB - about one day from now

There are posts all the way to Aug 04, 2015

RECENT SERIES

  1. Production postmortem (5):
    29 Jul 2015 - The evil licensing code
  2. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
  3. API Design (7):
    20 Jul 2015 - We’ll let the users sort it out
  4. What is new in RavenDB 3.5 (3):
    15 Jul 2015 - Exploring data in the dark
  5. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats