Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

@

Posts: 5,947 | Comments: 44,540

filter by tags archive

Designing a document databaseAggregation


I said that I would speak a bit about aggregations. On the face of it, aggregation looks simple, really simple. Continuing the same thread of design from before, we can have:

image

The problem is that while this is really nice, it doesn’t really work.

The problem is that using this approach, we are going to have to recalculate the view for the entire document set that we have, a potentially very expensive operation. Now, technically I can solve the problem by rewriting the Linq statement. The problem is that it wouldn’t really work. While it is possible to do so, it wouldn’t really work because the following code assume that it knows all the state, and there is no way to regenerate that state in an incremental fashion.

Let us try a better approach:

image

Thanks for Alex Yakunin, for helping me simplify this.

What do we have now? We split the problem into two sections, the Map and the Reduce. Note that to simplify things, map and reduce must return objects in the same shape. That means that we don’t need an explicit re-reduce phase.

That is much easier to reason about, and it allow us to perform aggregation in a very easy manner, allowing us to do aggregation in a manner that is simple to partition. I am probably going to have another post regarding the actual details of handling aggregations.

More posts in "Designing a document database" series:

  1. (17 Mar 2009) What next?
  2. (16 Mar 2009) Remote API & Public API
  3. (16 Mar 2009) Looking at views
  4. (15 Mar 2009) View syntax
  5. (14 Mar 2009) Aggregation Recalculating
  6. (13 Mar 2009) Aggregation
  7. (12 Mar 2009) Views
  8. (11 Mar 2009) Replication
  9. (11 Mar 2009) Attachments
  10. (10 Mar 2009) Authorization
  11. (10 Mar 2009) Concurrency
  12. (10 Mar 2009) Scale
  13. (10 Mar 2009) Storage

Comments

Nathaniel Neitzke

I would also take a look at a merge step. I know this is something I am currently looking into (map-reduce-merge).
portal.acm.org/citation.cfm?doid=1247480.1247602

I think using LINQ is an interesting choice as you mentioned before because of the possibility of tapping into PLINQ. Will have to think about this one.

configurator

So the point is that you generate a map that looks as if it is grouped, and then you gradually reduce it into a smaller number of large groups?

Ayende Rahien

Nathaniel,

This requires registration, what is "merge" stage?

Ayende Rahien

configurator

Yes, that is the idea, see next installment for the exact process.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. RavenDB Sharding (2):
    21 May 2015 - Adding a new shard to an existing cluster, the easy way
  2. The RavenDB Comic Strip (2):
    20 May 2015 - Part II – a team in trouble!
  3. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  4. Interview question (2):
    30 Mar 2015 - fix the index
  5. Excerpts from the RavenDB Performance team report (20):
    20 Feb 2015 - Optimizing Compare – The circle of life (a post-mortem)
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats