RavenDB – Live projections, or how to do joins in a non relational database

time to read 7 min | 1252 words

This feature has me really excited, because it solves a pretty big problem and it does so in a really elegant fashion.

Let us start from the beginning. Documents are independent, which means that processing a single document should not require loading additional documents. This, in turn, leads to denormalization, so we can keep the data that we need about our associations in the same document.

The problem, of course, is that there are many cases where this denormalization is annoying. In particular, it means that you have to take responsibility for handling the updates to the denormalized data. There are good reasons to want to do that, particularly if you are working on a sharded data store. And yet… many people don’t run in a sharded store, so why make them pay for that?

I found it hard to answer, especially since we already introduced the includes feature. For a while, I thought that just doing the include was enough, but then I got into a discussion with Rob about this. And we had the chance to talk about projections vs. documents. We agreed that we wanted a solution, and we started throwing a lot of crazy ideas (multi sourced, background updated, materialized views – to name one) around. Until we finally realized that we were being really stupid. The data is already there!

It was just that I was too blind to see how we can push it out.

Let us talk in code for a minute, since it would be easier to demonstrate how things work:

using (var s = ds.OpenSession())
    var entity = new User { Name = "Ayende" };
    s.Store(new User { Name = "Oren", AliasId = entity.Id });

This creates two documents and links between them.

Now, let us say that we want to display the following grid:

User Alias
Oren Ayende

Well, we need to query for all the users that has an alias, then include the associated document. Something like this:

var usersWithAliases = from user in session.Query<User>().Include(x=>x.AliasId)
                       where user.AliasId != null
                       select user;

var results = new List<UserAndAlias>();

foreach(var user in usersWithAliases)
        new UserAndAlias
            User = user.Name
            Alias = session.Load<User>(user.AliasId).Name

Here is the deal, this is very efficient in terms of calling the database only once, but it does means that we are passing the full document back, which may be something that we may not want to do.

Not to mention that there is a whole lot of code here.

Okay, so far we have introduced the problem. Let us see how we can solve it. We can do that by applying a live projection at the server side. A live projection transforms the results of a query on the server side, and it has access to other documents as well. Let us see what I mean by that:

public class Users_ByAlias : AbstractIndexCreationTask<User>
    public Users_ByAlias()
        Map =
            users => from user in users
                     select new {user.AliasId};

        TransformResults =
            (database, users) => from user in users
                                 let alias= database.Load<User>(user.AliasId)
                                 select new {Name = user.Name, Alias = alias.Name};

It is important to understand exactly what is going on here. The TransformResults will be executed on the results on the query, which gives it the change to modify, extend or filter them. In this case, it gives you the ability to look at data from another document.

For the DB guys among you, this performs a nested loop join.

Now, we can just write:

var usersWithAliases = 
     (from user in session.Query<User, Users_ByAlias>()
     where user.AliasId != null
     select user).As<UserAndAlias>();

This will query the index, transform the results on the server side, and give us the UserAndAlias colelction that we can just use.

Did I mention that I am really excited about this feature?