Data modeling with indexesEvent sourcing–Part I

time to read 5 min | 875 words

In this post, I want to take the notion of doing computation inside RavenDB’s indexes to the next stage. So far, we talked only about indexes that work on a single document at a time, but that is just the tip of the iceberg of what you can do with indexes inside RavenDB. What I want to talk about today is the ability to do computations over multiple documents and aggregate them. The obvious example is in the following RQL query:

image

That is easy to understand, it is simple aggregation of data. But it can get a lot more interesting. To start with, you can add your own aggregation logic in here, which open up some interesting ideas. Event Sourcing, for example, is basically a set of events on a subject that are aggregated into the final model. Probably the classiest example of event sourcing is the shopping cart example. In such a model, we have the following events:

  • AddItemToCart
  • RemoveItemFromCart
  • PayForCart

Here what these look like, in document form:

image

We add a couple of items to the cart, remove excess quantity and pay for the whole thing. Pretty simple model, right? But how does this relate to indexing in RavenDB?

Well, the problem here is that we don’t have a complete view of the shopping cart. We know what the actions were, but not what its current state is. This is where our index come into play, let’s see how it works.

The final result of the cart should be something like this:

image

Let’s see how we get there, shall we?

We’ll start by processing the add to cart events, like so:

As you can see, the map phase here build the relevant parts of the end model directly. But we still need to complete the work by doing the aggregation. This is done on the reduce phase, like so:

Most of the code here is to deal with merging of products from multiple add actions, but even that should be pretty simple. You can see that there is a business rule here. The customer will be paying the minimum price they encountered throughout the process of building their shopping cart.

Next, let’s handle the removal of items from the cart, which is done in two steps. First, we map the remove events:

There are a few things to note here, the quantity is negative, and the price is zeroed, that necessitate changes in the reduce as well. Here they are:

As you can see, we now only get the cheapest price, above zero, and we’ll remove empty items from the cart. The final step we have to take is handle the payment events. We’ll start with the map first, obviously.

Note that we added a new field to the output. Just like we set the Products fields in the pay for cart map to empty array, we need to update the rest of the maps to include a Paid: {} to match the structure. This is because all the maps (and the reduce) in an index must output the same shape out.

And now we can update the reduce accordingly. Here is the third version:

This is almost there, but we still need to do a bit more work to get the final output right. To make things interesting, I changed things up a bit and here is how we are paying for this cart:

image

And here is the final version of the reduce:

And the output of this for is:

image

You can see that this is a bit different from what I originally envisioned it. This is mostly because I’m bad at JavaScript and likely took many shortcuts along the way to make things easy for myself. Basically, I was easier to do the internal grouping using an object than using arrays.

Some final thoughts:

  • A shopping cart is usually going to be fairly small with a few dozens of events in the common case. This method works great for this, but it will also scale nicely if you need to aggregate over tens of thousands of events.
  • A key concept here is that the reduce portion is called recursively on all the items, incrementally building the data until we can’t reduce it any further. That means that the output we have get should also serve as the input to the reduce. This take some getting used to, but it is a very powerful technique.
  • The output of the index is a complete model, which you can use inside your system. I the next post, I’ll discuss how we can more fully flesh this out.

If you want to play with this, you can get the dump of the database that you can import into your own copy of RavenDB (or our live demo instance).

More posts in "Data modeling with indexes" series:

  1. (22 Feb 2019) Event sourcing–Part III–time sensitive data
  2. (11 Feb 2019) Event sourcing–Part II
  3. (30 Jan 2019) Event sourcing–Part I
  4. (14 Jan 2019) Predicting the future
  5. (10 Jan 2019) Business rules
  6. (08 Jan 2019) Introduction