Ayende @ Rahien

Refunds available at head office

RavenDB: Includes

When I set out to build RavenDB, I had a very clear idea about what I wanted to do. I wanted to build an integrated, opinionated, solution. Something that will do the right thing most of the time, and let you override that if you really want to.

One of the things that really drove the design was 6 – 7 years of experience in building applications based on RDBMS using ORMs. Let me put it gently, I am… well acquainted with the problems that people may run into when they use an ORM. One of the things that I wanted to avoid was duplicating the possibility of error with RavenDB.

One of the major design decisions that I made was to disallow associations between documents. This is part of the core design of the system.

Let us take the following example:

 image

As you can see, we have two documents, an Order and a Customer. The order references a customer, but unlike in RDBMS, we use a denormalized reference, with both the Id and the Name of the customer stored inside the Order document. That is advantageous because it allows us to perform most operations on the Order document without having to load the Customer document.

From the C# model, it looks like this:

public class Order
{
      public string Id { get;set; }
      public Address ShippingAddress { get;set; }
      public Address BillingAddress { get;set; }
      public DenormalizedReference Customer { get;set; }
}

public class DenormalizedReference
{
      public string Id { get;set; }
      public string Name { get;set; }
}

public class Customer
{
      public string Id { get;set; }
      public string Name { get;set; }
      public string Email { get;set; }

}

Note that there isn’t a direct reference between the Order and the Customer. Instead, Order holds a DenormalizedReference, which holds the interesting bits from Customer that we need to process requests on Order.

So far so good, but, and this is important, you can’t always set things up this way. There is a set of cases where you do want to be able to access the associated document.

Well, that is easy enough, isn’t it? All we need to do is load it:

var order = session.Load<Order>("orders/9432");
var customer = session.Load<Customer>(order.Customer.Id);

This is simple, easy to read, easy to understand and make me want to curl into a ball and weep. The problem is, of course, that this is going to generate two calls to Raven. And if there is one thing that I pay attention to is the number of remote calls that I am making.

I started to think about how I can make this scenario work better, and I came up with the following design.

Given the two documents

Then for GET /docs/orders/9432

image

And for GET /docs/orders/9432?include=Customer.Id

image 

Note that in the second case, we get the full customer data merged into the order document.

From an implementation perspective, this would be very easy to do. The problem is how to represent this in the client API. We had a very interesting discussion on the topic in the mailing list.

Let me explain the problem in detail. Given the C# classes above, how do you express this notion of the include? You can’t use the Order model above, because that Customer property is going to of type DenormalizedReference. We can’t make that property of type Customer, either, because then the Customer data would be embedded inside the Order document, which isn’t what we wanted.

In the mailing list, there were a lot of proposal being raised, the one that seemed to be the most popular was to drop the 1:1 mapping between the C# model and the document model and move to something like this:

[RootAggregate]
public class Order
{
    public string Id { get; set; }
    public string Name { get; set; }
    public Customer Customer { get; set; }
}

[RootAggregate]
public class Customer
{
    public string Id { get; set; }
    [Denormalized]
    public string Name { get; set; }
    public string Email { get; set; }

}

And then make the client API smart enough to understand the attribute. The model above would generate the same documents as the previous model, but would allow much easier time when working on features such as this. This way, we can normally access the data that is embedded in the document, but also include the associated document when we need it.

There are several problems here:

  • This creates a misleading API, making people think that things are normalized when they aren’t.
  • It is going to bring back ALL the problems associated with lazy loading (worse, it is going to bring back all the problems associated with EF 1.0 lazy loading).
  • It goes directly against the way I believe you should work with a document database.

But I couldn’t think of any other way, nor could anyone else.

Until Frank Schwieterman come to our rescue:

Maybe rather then join the documents into one result, such a request would cause the 'joined' entities to be preloaded instead.
From the client API perspective, I would load do the joined load of the user object, and the user is returned in its original form.  But now the session has the customer object preloaded, so when I try to load the customer object via the client API no request is made to the server.  From the caller's perspective, the only change to usage has been the preload hint passed in the original request.

Yes!

The problem was (from my perspective) never with the way the model is structured, the problem was that the way the documents were model caused a performance problem. Frank’s suggestion completely eliminated that issue.

It took some interesting coding to get it to work properly, but essentially, it is just an application of the Future usage for loading large object graph in NHibernate. Now we can do:

var order = session
    .Include("Customer.Id")
    .Load<Order>("orders/1");

var customer = session.Load<Customer>(order.Customer.Id);

And this code will only go to the server once!

We get to keep the separate model, and we can manipulate how we are loading associations easily. I really like this solution.

Comments

Roy
08/12/2010 07:55 AM by
Roy

Don't you mean "?include=Customer" instead of "?include=Company"?

Louis Hau&#223;knecht
08/12/2010 08:55 AM by
Louis Haußknecht

I'm, a bit confused by the GET url: GET /docs/users/oren?include=Customer.Id

To retrieve the order with the customer included I'd use

GET /docs/orders/9432?include=Customer.Id

Thomas Eyde
08/12/2010 08:58 AM by
Thomas Eyde

Isn't there a way to get rid of the string argument in Include()? It looks funny compared to the typesafe Load <t().

Ayende Rahien
08/12/2010 08:59 AM by
Ayende Rahien

Thomas,

Yes, there is a Lambda option as well

Ayende Rahien
08/12/2010 09:09 AM by
Ayende Rahien

Louis,

Damn, I have a lot of typos in this post, fixed, thanks.

Adam
08/12/2010 09:26 AM by
Adam

Will this work for indexes as well? So I can do

var orders = session

    .Include("Customer.Id")

    .Query

<order("orders_all");

and get all customers loaded for all orders.

Adam
08/12/2010 09:36 AM by
Adam

Great feature, love it, thanks

Jonty
08/12/2010 11:23 AM by
Jonty

What is an example of a case where you would need the email from the orders? Wouldn't you create a new document for that purpose? This seems like it might be encouraging bad design decisions.

Ayende Rahien
08/12/2010 11:48 AM by
Ayende Rahien

Jonty,

A common case would be if you need to notify the customer about delay in the order

tucaz
08/12/2010 11:54 AM by
tucaz

I´m not a NoSql user yet, but I have a question. If the type of Customer object in Order class is a DenormalizedReference is is possible for one to expect getting the full customer when calling for order.Customer somehow?

I understood that you can pre-fetch the Customer and then access it through Session.Load but then we would need to 1) Keep a reference to the session object in Order instance to encapsulate the GetCustomer() method to return the full Customer or 2) Let the caller know that order.Customer will never be entire filled and if he wants it he will need to call Session.Load(order.Customer.Id).

I´m think of a way to both communicate the user that he has everything available when he needs and still don´t leak infrastructure aspects to the domain.

Ayende Rahien
08/12/2010 11:59 AM by
Ayende Rahien

Tucaz,

In short, no. See the discussion on the model changes required to make this work, and why I don't like them.

The easy way to handle this is to use an ambient session, which is the general recommendation anyway.

And I like the fact that you need to take an extra step. You don't want to be able to reference stuff outside your own aggregate easily. See the discussion on Root Aggregates in DDD

Brian Vallelunga
08/12/2010 01:59 PM by
Brian Vallelunga

Any chance we'll be able to include a collection of Ids? I think it might be useful in situations like this:

public class Customer {

public string Id { get; set; }

public string[] OrderIds { get; set; }

}

I'd want to be able to preload all orders here.

Ayende Rahien
08/12/2010 02:23 PM by
Ayende Rahien

Brian,

This scenario just works

Charles Strahan
08/12/2010 02:57 PM by
Charles Strahan

Very neat, Ayende. How would you support "including" a chain of references - such as Order.Customer.BestFriend.Etc?

-Charles

Benjamin
08/12/2010 03:07 PM by
Benjamin

Hi Ayende,

Good stuff! With reference to Brian Vallelunga's question, how does a query (to preload all orders) look?

Thanks!

Ayende Rahien
08/12/2010 03:09 PM by
Ayende Rahien

Benjamin,

Exactly the same.

Include("Orders")

Ayende Rahien
08/12/2010 03:10 PM by
Ayende Rahien

Charles,

I wouldn't support it, I can't think of an actual scenario where you need it in a document database.

Jonty
08/12/2010 04:19 PM by
Jonty

"A common case would be if you need to notify the customer about delay in the order"

That sounds like a search screen where the rows would have the email in them already - ie a different document.

Ayende Rahien
08/12/2010 06:32 PM by
Ayende Rahien

Jonty,

That might be the case, yes, and it might also be the case that it is a problem with the way we model stuff.

But that is a features that a lot of people wanted

jdn
08/13/2010 12:41 AM by
jdn

Thanks for accepting override patches.

cowgaR
08/13/2010 08:33 AM by
cowgaR

first thing, changing the order of chaining methods would bring IMO __much better meaning of what you were trying to accomplish.

var order = session.Load <order("orders/1").Include("Customer.Id");

second, if all you're after is having to hit the DB just once when retrieving the Customer document for known Order.Id, why don't make API to have the Customer document loaded in one straight call?

var customer;

var order = session.Include("Customer.Id" __, out customer

).Load <order("orders/1");

for third, I would again change the ordering of methods so...

var customer;

var order = session.Load <order("orders/1").Include("Customer.Id", out customer);

just an idea, I know you're prove me wrong :)

Ayende Rahien
08/13/2010 09:36 AM by
Ayende Rahien

cowgaR,

Regarding the method ordering.

Sure, I would like that.

Now make it work when you don't include stuff as weel.

var person = session.Load(Person)

var person = session.Load(Person).Include("Customer")

Regarding the out param, good idea

oharab
08/13/2010 01:34 PM by
oharab

If you changed the API slightly could you have the order like cowgaR suggests.

var person=session.Get(Person).Load();

var person=session.Get(Person).Include("Customer").Load();

Ayende Rahien
08/13/2010 02:35 PM by
Ayende Rahien

oharab,

That would make the default case (no includes) much uglier.

Daniel Steigerwald
08/17/2010 12:19 AM by
Daniel Steigerwald

I am not sure, if denormalized references are not actually premature optimization. Why should user care about minified models?

I know, it's stored twice, still...

Aggregate is class with Id

public class Order

{

public string Id { get; set; }

public string Name { get; set; }

public Customer Customer { get; set; }

}

public class Customer

{

public string Id { get; set; }

public string Name { get; set; }

public string Email { get; set; }

}

What's wrong with loading Order with whole Customer? I suppose it's explicit enough. It should work also for storing.

Instead of Hibernate default laziness, Raven can eager load everything.

Maybe it is stupid idea, but I would like to hear your opinion.

Daniel Steigerwald
08/17/2010 12:43 AM by
Daniel Steigerwald

I hope I understand all consequences related to sql joins, especially lazy evaluated. I am still newbie, but as I see it, raven document database is about eagerness (as opposite to sql laziness).

We have to create index before we can use it. Fine. We can put documents into db without scheme, super fine. So we should be able to load whole objects graphs in one step as well.

This code:

var order = session

.Include("Customer.Id")

.Load

<order("orders/1");

Is equivalent to afore mentioned, in case laziness is forbidden.

Daniel Steigerwald
08/17/2010 12:59 AM by
Daniel Steigerwald

|This creates a misleading API, making people think that things are normalized when they aren’t.

It suppose it is implementation detail. Remember that object with id contained in another object with id is stored twice is easy.

|It is going to bring back ALL the problems associated with lazy loading (worse, it is going to bring back all the problems associated with EF 1.0 lazy loading).

So disallow lazy load at all. I don't need it anyway.

|It goes directly against the way I believe you should work with a document database.

What's wrong with hypertext documents? They are still documents.

Include is nice feature, but soon enough my code probably will be full of includes.

PS: Maybe I overlooked something (or everything) ;) It was just written brainstorming :)

Ayende Rahien
08/17/2010 05:57 AM by
Ayende Rahien

Daniel,

And Customer has a reference to Company, which has reference to Products, which has reference to...

In other words, you loaded the entire database.

It isn't premature, it is something that you have to deal with

Ayende Rahien
08/17/2010 05:58 AM by
Ayende Rahien

Daniel,

Yes, that is pretty much the point. Because you want to be able to control this for each scenario.

There is no one scenario that fit all

Ayende Rahien
08/17/2010 06:01 AM by
Ayende Rahien

Daniel,

You can't say it is an implementation detail, not when the impact is making remote calls.

And you can't disallow lazy loading, not when I consider this lazy loading as well:

session.Load(order.Customer.Id);

Hypertext docs are great, but you only read ONE doc at a time.

With DocDB documents, you may want to access more than that

Andres
09/16/2010 03:30 AM by
Andres

What about index based join?

Like this:

Map:


from doc in docs

where doc["@metadata"]["Raven-Entity-Name"] == "Products" || doc["@metadata"]["Raven-Entity-Name"] == "ProductInputs"

select new {

Code = doc["@metadata"]["Raven-Entity-Name"] == "Products" ? doc.Code : doc.ProductCode,

Input = doc["@metadata"]["Raven-Entity-Name"] == "Products" ? doc.FirstInput : doc.Input

};

Reduce:


from result in results

group result by result.Code into g

select new

{

Code = g.Key,

Count = g.Count(),

TotalInputs = g.Sum(x => x.Input ?? 0)

}

Ayende Rahien
09/16/2010 10:58 AM by
Ayende Rahien

Andres,

While you can make this work, I am not quite sure what is the purpose. Especially in the context of includes.

Andres
09/16/2010 11:32 AM by
Andres

That, maybe the includes and other denormalizations can be done by indexes.

Ayende Rahien
09/16/2010 11:41 AM by
Ayende Rahien

Why would this be beneficial?

Andres
09/16/2010 01:21 PM by
Andres

It is faster and simpler than triggers and than non-intuitive queries like this:

var order = session.Include("Customer.Id").Load <order("orders/1");

(magic string, and how you now that you are loading a Customer?)

But Raven index syntax is not enough expressive. Doesn't it?

Sorry about my bad English.

Comments have been closed on this topic.