Data modeling with indexes: Introduction

Jan 08 2019

Data modeling with indexesIntroduction

time to read 4 min | 734 words

The title of this post is pretty strange, I admit. Usually, when we think about modeling, we think about our data. If it is a relational database, this mostly mean the structure of your tables and the relations between them. When using a document database, this means the shape of your documents. But in both cases, indexes are there merely to speed things up. Oh, a particular important query may need an index, and that may impact how you lay out the data, but these are relatively rare cases. In relational databases and most non relational ones, indexes do not play any major role in data modeling decisions.

This isn’t the case for RavenDB. In RavenDB, an index doesn’t exist merely to organize the data in a way that make it easier for the database to search for it. An index is actually able to modify and transform the data, on the current document or full related data from related documents. A map/reduce index is even able aggregate data from multiple documents as part of the indexing process. I’ll touch on the last one in more depth later in this series, first, let’s tackle the more obvious parts. Because I want to show off some of the new features, I’m going to use JS for most of the indexes definitions in these psots, but you can do the same using Linq / C# as well, obviously.

When brain storming for this post, I got so many good ideas about the kind of non obvious things that you can do with RavenDB’s indexes that a single post has transformed into a series and I got two pages of notes to go through. Almost all of those ideas are basically some form of computation during indexing, but applied in novel manners to give you a lot of flexibility and power.

RavenDB prefers to have more work to do during indexing (which is batched and happen on the background) than during query time. This means that we can push a lot more work into the background and just let RavenDB handle it for us. Let’s start from what is probably the most basic example of computation during query, the Order’s Total. Consider the following document:

As you can see, we have the Order document and the list of the line items in this order. What we don’t have here is the total order value.

Now, actually computing how much you are going to pay for an order is complex. You have to take into account taxation, promotions, discounts, shipping costs, etc. That isn’t something that you can do trivially, but it does serve to make an excellent simple example and similar requirements exists in many fields.

We can obvious add an Total field to the order, but then we have to make sure that we update it whenever we update the order. This is possible, but if we have multiple clients consuming the data, this can be fairly hard to do. Instead, we can place the logic to compute the property in the index itself. Here is how it would look like:

The same index in JavaScript is almost identical:

In this case, they are very similar, but as the complexity grow, I find it is usually easier to express logic as a JavaScript index rather than use a single (complex) Linq expression.

Such an index give us a computed field, Total, that has the total value of an order. We can query on this field, sort it and even project it back (if this field is stored). It allow us to always have the most up to date value and have the database take care of computing it.

This basic technique can be applied in many different ways and affect the way we shape and model our data. Currently I have at least three more posts planned for this series, and I would love to hear your feedback. Both on the kind of stuff you would like me to talk about and the kind of indexes you are using RavenDB and how it impacted your data modeling.

0 comments

Tags:

Oren Eini

Oren Eini

CEO of RavenDB