Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

, @ Q j

Posts: 6,784 | Comments: 48,886

filter by tags archive
time to read 3 min | 505 words

Computation during indexes open up some nice  features when we are talking about data modeling and working with your data. In this post, I want to discuss predicting the future with it. Let’s see how we can do that, shall we?

Consider the following document, representing a (simplified) customer model:

image

We have a customer that is making monthly payments. This is a pretty straightforward model, right?

We can do a lot with this kind of data. We can obviously compute the lifetime value of a customer, based on how much they paid us. We already did something very similar in a previous post, so that isn’t very interesting.

What is interesting is looking into the future. Let’s see how we can start simple, but figuring out what is the next charge rate for this customer. For now, the logic is about as simple as it can be. Monthly customers pay by month, basically. Here is the index:

image

I’m using Linq instead of JS here because I’m dealing with dates and JS support for dates is… poor.

As you can see, we are simply looking at the last date and the subscription, figuring out how much we paid the last three times and use that as the expected next payment amount. That can allow us to do nice things, obviously. We can now do queries on the future. So finding out how many customers will (probably) pay us more than 100$ on the 1st of Feb both easy and cheap.

We can actually take this further, though. Instead of using a simple index, we can use a map/reduce one. Here is what this looks like:

image

And the reduce:

image

This may seem a bit dense at first, so let’s de-cypher it, shall we?

We take the last payment date and compute the average of the last three payments, just as we did before. The fun part now is that we don’t compute just the single next payment, but the next three. We then output all the payments, both existing (that already happened) and projected (that will happen in the future) from the map function. The reduce function is a lot simpler, and simply sum up the amounts per month.

This allows us to effectively project data into the future, and this map reduce index can be used to calculate expected income. Note that this is aggregated across all customers, so we can get a pretty good picture of what is going to happen.

A real system would probably have some uncertainty factor, but that touches on business strategy more than modeling, so I don’t think we need to go into that here.

time to read 4 min | 613 words

imageIn my last post on the topic, I showed how we can define a simple computation during the indexing process. That was easy enough, for sure, but it turns out that there are quite a few use cases for this feature that go quite far from what you would expect. For example, we can use this feature as part of defining and working with business rules in our domain.

For example, let’s say that we have some logic that determine whatever a product is offered with a warranty (and for how long that warranty is valid). This is an important piece of information, obviously, but it is the kind of thing that changes on a fairly regular basis. For example, consider the following feature description:

As a user, I want to be able to see the offered warranty on the products, as well as to filter searches based on the warranty status.

Warranty rules are:

  • For new products made in house, full warranty for 24 months.
  • For new products from 3rd parties, parts only warranty for 6 months.
  • Refurbished products by us, full warranty, for half of new warranty duration.
  • Refurbished 3rd parties products, parts only warranty, 3 months.
  • Used products, parts only, 1 month.

Just from reading the description, you can see that this is a business rule, which means that it is subject to many changes over time. We can obviously create a couple of fields on the document to hold the warranty information, but that means that whenever the warranty rules change, we’ll have to go through all of them again. We’ll also need to ensure that any business logic that touches the document will re-run the logic to apply the warranty computation (to be fair, these sort of things are usually done as a subscription in RavenDB, which alleviate that need).

Without further ado, here is the index to implement the logic above:

You can now query over the warranty types and it’s duration, project them from the index, etc. Whenever a document is updates, we’ll re-compute the warranty status and update the index.

This saves you from having additional fields in your model and greatly diminish the cost of queries that need to filter on warranty or its duration (since you don’t need to do this computation during the query, only once, during indexing).

If the business rule definition changes, you can update the index definition and RavenDB will effectively roll out your change to the entire dataset. That is nice, but even though I’m writing about cool RavenDB features, there are some words of cautions that I want to mention.

Putting queryable business rules in the database can greatly ease your life, but be wary of putting too much business logic in there. In general, you want your business logic to reside right next to the rest of your application code, not running in a different server in a mode that is much harder to debug, version and diagnose. And if the level of complexity involved in the business rule exceed some level (hard to define, but easy to know when you hit it), you should probably move from defining the business rules in an index to a subscription.

A RavenDB subscription allow you to get all changes to documents and apply your own logic in response. This is a reliable way to process data in RavenDB, this runs in your own code, under your own terms, so it can enjoy all the usual benefits of… well, being your code, and not mine. You can read more about them in this post and of course, the documentation.

time to read 4 min | 734 words

imageThe title of this post is pretty strange, I admit. Usually, when we think about modeling, we think about our data. If it is a relational database, this mostly mean the structure of your tables and the relations between them. When using a document database, this means the shape of your documents. But in both cases, indexes are there merely to speed things up. Oh, a particular important query may need an index, and that may impact how you lay out the data, but these are relatively rare cases. In relational databases and most non relational ones, indexes do not play any major role in data modeling decisions.

This isn’t the case for RavenDB. In RavenDB, an index doesn’t exist merely to organize the data in a way that make it easier for the database to search for it. An index is actually able to modify and transform the data, on the current document or full related data from related documents. A map/reduce index is even able aggregate data from multiple documents as part of the indexing process. I’ll touch on the last one in more depth later in this series, first, let’s tackle the more obvious parts. Because I want to show off some of the new features, I’m going to use JS for most of the indexes definitions in these psots, but you can do the same using Linq / C# as well, obviously.

When brain storming for this post, I got so many good ideas about the kind of non obvious things that you can do with RavenDB’s indexes that a single post has transformed into a series and I got two pages of notes to go through. Almost all of those ideas are basically some form of computation during indexing, but applied in novel manners to give you a lot of flexibility and power.

RavenDB prefers to have more work to do during indexing (which is batched and happen on the background) than during query time. This means that we can push a lot more work into the background and just let RavenDB handle it for us. Let’s start from what is probably the most basic example of computation during query, the Order’s Total. Consider the following document:

image

As you can see, we have the Order document and the list of the line items in this order. What we don’t have here is the total order value.

Now, actually computing how much you are going to pay for an order is complex. You have to take into account taxation, promotions, discounts, shipping costs, etc. That isn’t something that you can do trivially, but it does serve to make an excellent simple example and similar requirements exists in many fields.

We can obvious add an Total field to the order, but then we have to make sure that we update it whenever we update the order. This is possible, but if we have multiple clients consuming the data, this can be fairly hard to do. Instead, we can place the logic to compute the property in the index itself. Here is how it would look like:

image

The same index in JavaScript is almost identical:

In this case, they are very similar, but as the complexity grow, I find it is usually easier to express logic as a JavaScript index rather than use a single (complex) Linq expression.

Such an index give us a computed field, Total, that has the total value of an order. We can query on this field, sort it and even project it back (if this field is stored). It allow us to always have the most up to date value and have the database take care of computing it.

This basic technique can be applied in many different ways and affect the way we shape and model our data. Currently I have at least three more posts planned for this series, and I would love to hear your feedback. Both on the kind of stuff you would like me to talk about and the kind of indexes you are using RavenDB and how it impacted your data modeling.

FUTURE POSTS

  1. Using TLS in Rust: Going to async I/O with Tokio - 2 days from now
  2. Investigating self inflicted wounds: The SSL failure on the Linux build server - 3 days from now
  3. Using TLS in Rust: tokio ain’t mere mortals - 4 days from now
  4. Pesky code review comments - 5 days from now
  5. Using TLS in Rust: Getting async I/O with tokio, second try - 6 days from now

And 3 more posts are pending...

There are posts all the way to Feb 04, 2019

RECENT SERIES

  1. Using TLS with Rust (4):
    17 Jan 2019 - Authentication
  2. Data modeling with indexes (3):
    14 Jan 2019 - Predicting the future
  3. Reminder (9):
    03 Jan 2019 - I’ll be in CodeMash is next week
  4. Production postmortem (24):
    25 Dec 2018 - Handled errors and the curse of recursive error handling
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats