RavenDBIncludes

time to read 9 min | 1736 words

When I set out to build RavenDB, I had a very clear idea about what I wanted to do. I wanted to build an integrated, opinionated, solution. Something that will do the right thing most of the time, and let you override that if you really want to.

One of the things that really drove the design was 6 – 7 years of experience in building applications based on RDBMS using ORMs. Let me put it gently, I am… well acquainted with the problems that people may run into when they use an ORM. One of the things that I wanted to avoid was duplicating the possibility of error with RavenDB.

One of the major design decisions that I made was to disallow associations between documents. This is part of the core design of the system.

Let us take the following example:

 image

As you can see, we have two documents, an Order and a Customer. The order references a customer, but unlike in RDBMS, we use a denormalized reference, with both the Id and the Name of the customer stored inside the Order document. That is advantageous because it allows us to perform most operations on the Order document without having to load the Customer document.

From the C# model, it looks like this:

public class Order
{
      public string Id { get;set; }
      public Address ShippingAddress { get;set; }
      public Address BillingAddress { get;set; }
      public DenormalizedReference Customer { get;set; }
}

public class DenormalizedReference
{
      public string Id { get;set; }
      public string Name { get;set; }
}

public class Customer
{
      public string Id { get;set; }
      public string Name { get;set; }
      public string Email { get;set; }

}

Note that there isn’t a direct reference between the Order and the Customer. Instead, Order holds a DenormalizedReference, which holds the interesting bits from Customer that we need to process requests on Order.

So far so good, but, and this is important, you can’t always set things up this way. There is a set of cases where you do want to be able to access the associated document.

Well, that is easy enough, isn’t it? All we need to do is load it:

var order = session.Load<Order>("orders/9432");
var customer = session.Load<Customer>(order.Customer.Id);

This is simple, easy to read, easy to understand and make me want to curl into a ball and weep. The problem is, of course, that this is going to generate two calls to Raven. And if there is one thing that I pay attention to is the number of remote calls that I am making.

I started to think about how I can make this scenario work better, and I came up with the following design.

Given the two documents

Then for GET /docs/orders/9432

image

And for GET /docs/orders/9432?include=Customer.Id

image 

Note that in the second case, we get the full customer data merged into the order document.

From an implementation perspective, this would be very easy to do. The problem is how to represent this in the client API. We had a very interesting discussion on the topic in the mailing list.

Let me explain the problem in detail. Given the C# classes above, how do you express this notion of the include? You can’t use the Order model above, because that Customer property is going to of type DenormalizedReference. We can’t make that property of type Customer, either, because then the Customer data would be embedded inside the Order document, which isn’t what we wanted.

In the mailing list, there were a lot of proposal being raised, the one that seemed to be the most popular was to drop the 1:1 mapping between the C# model and the document model and move to something like this:

[RootAggregate]
public class Order
{
    public string Id { get; set; }
    public string Name { get; set; }
    public Customer Customer { get; set; }
}

[RootAggregate]
public class Customer
{
    public string Id { get; set; }
    [Denormalized]
    public string Name { get; set; }
    public string Email { get; set; }

}

And then make the client API smart enough to understand the attribute. The model above would generate the same documents as the previous model, but would allow much easier time when working on features such as this. This way, we can normally access the data that is embedded in the document, but also include the associated document when we need it.

There are several problems here:

  • This creates a misleading API, making people think that things are normalized when they aren’t.
  • It is going to bring back ALL the problems associated with lazy loading (worse, it is going to bring back all the problems associated with EF 1.0 lazy loading).
  • It goes directly against the way I believe you should work with a document database.

But I couldn’t think of any other way, nor could anyone else.

Until Frank Schwieterman come to our rescue:

Maybe rather then join the documents into one result, such a request would cause the 'joined' entities to be preloaded instead.
From the client API perspective, I would load do the joined load of the user object, and the user is returned in its original form.  But now the session has the customer object preloaded, so when I try to load the customer object via the client API no request is made to the server.  From the caller's perspective, the only change to usage has been the preload hint passed in the original request.

Yes!

The problem was (from my perspective) never with the way the model is structured, the problem was that the way the documents were model caused a performance problem. Frank’s suggestion completely eliminated that issue.

It took some interesting coding to get it to work properly, but essentially, it is just an application of the Future usage for loading large object graph in NHibernate. Now we can do:

var order = session
    .Include("Customer.Id")
    .Load<Order>("orders/1");

var customer = session.Load<Customer>(order.Customer.Id);

And this code will only go to the server once!

We get to keep the separate model, and we can manipulate how we are loading associations easily. I really like this solution.

More posts in "RavenDB" series:

  1. (17 Feb 2025) Shared Journals
  2. (14 Feb 2025) Reclaiming disk space
  3. (12 Feb 2025) Write modes
  4. (10 Feb 2025) Next-Gen Pagers