Design exerciseComplex data aggregation with RavenDB

time to read 4 min | 749 words

Consider a business that needs to manage leasing apartments to tenants. One of the more important aspects of the business is tracking how much money is due. Because of the highly regulated nature of leasing, there are several interesting requirements that pop up.

The current issue is how do you tackle the baseline for eviction. Let’s say that the region that the business is operating under has the following minimum requirements for eviction:

  • Total unpaid debt (30 days from invoice) that is greater than $2,000.
  • Total overdue debt (30 – 60 days from invoice) that is greater than $1,000.
  • Total overdue debt (greater than 60 days from invoice) that is greater than $500.

I’m using the leasing concept here because it is easy to understand that the date ranges themselves are dynamic. We don’t want to wait for the next first of the month to see the changes.

The idea is that we want to be able to show a grid like this:


The property manager can then take action based on this data. And here is the raw data that we are working on:


It’s easy to see that this customer still has a balance of $175. Note that this balance is as of the July 9th, because we apply payments to the oldest invoice we have. The question now becomes, how can we turn this raw data into the table above?

This turn out to be a bit hard for RavenDB, because it is optimize to answer your queries fast, which means that having to do recalculation on each query (based on the current dates) is not easy. I already shown how to do this kind of task easily enough when we are looking at a single customer. The problem is that we want to have an overall view of the system, not just on a single customer. And ideally without it costing too much.

The key observation to handle this efficiently is RavenDB is to understand that we don’t need to actually generate the table above directly. We just need to get the data to a point where it is trivial to do so. After some thinking, I came up with the following desired output:


There idea here is that we are going to give both overall view on the customer’s account as well as details about its outstanding debts. The important detail that we need to understand is that this customer status is unlikely to grow too big. We aren’t likely to see customers that have debts that spans many years, so the size of this document is naturally bounded. The cost of going from this output to the table above is negligible and the process of doing so is obvious. So the only question now is how do we do this?

We are going to utilize RavenDB’s multi-map/reduce to the fullest here. Let’s first look at the maps:

There isn’t really anything interesting here. We are just outputting the data that we need for the second, more interesting stage, the reduce:

There is a whole bunch of stuff going on here, but leave aside how much JavaScript scares me, let’s dig into this.

The important parameters we have here are:

  • Debt
  • CreditBalance
  • RemainingBalance

We compute the CreditBalance by summing all the outstanding payments for the customer. We then gather up all the debts for the customer and sort them by date ascending. The next stage is to apply the outstanding credits toward each of the debts, erasing them from the list if they have been completely paid off. Along the way, we compute the overall remaining balance as well.

And that is pretty much it. It is important to understand that this code is recursive. In other words, if we have a customer that has a lot of invoices and receipts, we aren’t going to be computing this in one go over everything. Instead, we’ll compute this incrementally, over subsets of the data, applying the reduce function as we go.

Queries on this index are going to be fast, and applying new invoices and receipts is going to require very little effort. You can now also do the usual things you do with indexes. For example, sorting the customers by their outstanding balance or total lifetime value.

More posts in "Design exercise" series:

  1. (01 Aug 2019) Complex data aggregation with RavenDB
  2. (31 Jul 2019) Arbitrary range aggregations in RavenDB
  3. (30 Jul 2019) File system abstraction in RavenDB
  4. (19 Dec 2018) Distributing (consistent) data at scale, answer
  5. (18 Dec 2018) Distributing (consistent) data at scale
  6. (26 Nov 2018) A generic network protocol