Designing a document databaseAggregation
I said that I would speak a bit about aggregations. On the face of it, aggregation looks simple, really simple. Continuing the same thread of design from before, we can have:
The problem is that while this is really nice, it doesn’t really work.
The problem is that using this approach, we are going to have to recalculate the view for the entire document set that we have, a potentially very expensive operation. Now, technically I can solve the problem by rewriting the Linq statement. The problem is that it wouldn’t really work. While it is possible to do so, it wouldn’t really work because the following code assume that it knows all the state, and there is no way to regenerate that state in an incremental fashion.
Let us try a better approach:
Thanks for Alex Yakunin, for helping me simplify this.
What do we have now? We split the problem into two sections, the Map and the Reduce. Note that to simplify things, map and reduce must return objects in the same shape. That means that we don’t need an explicit re-reduce phase.
That is much easier to reason about, and it allow us to perform aggregation in a very easy manner, allowing us to do aggregation in a manner that is simple to partition. I am probably going to have another post regarding the actual details of handling aggregations.
More posts in "Designing a document database" series:
- (17 Mar 2009) What next?
- (16 Mar 2009) Remote API & Public API
- (16 Mar 2009) Looking at views
- (15 Mar 2009) View syntax
- (14 Mar 2009) Aggregation Recalculating
- (13 Mar 2009) Aggregation
- (12 Mar 2009) Views
- (11 Mar 2009) Replication
- (11 Mar 2009) Attachments
- (10 Mar 2009) Authorization
- (10 Mar 2009) Concurrency
- (10 Mar 2009) Scale
- (10 Mar 2009) Storage
I would also take a look at a merge step. I know this is something I am currently looking into (map-reduce-merge).
I think using LINQ is an interesting choice as you mentioned before because of the possibility of tapping into PLINQ. Will have to think about this one.
So the point is that you generate a map that looks as if it is grouped, and then you gradually reduce it into a smaller number of large groups?
This requires registration, what is "merge" stage?
Yes, that is the idea, see next installment for the exact process.