Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,128 | Comments: 45,549

filter by tags archive


time to read 9 min | 1736 words

When I set out to build RavenDB, I had a very clear idea about what I wanted to do. I wanted to build an integrated, opinionated, solution. Something that will do the right thing most of the time, and let you override that if you really want to.

One of the things that really drove the design was 6 – 7 years of experience in building applications based on RDBMS using ORMs. Let me put it gently, I am… well acquainted with the problems that people may run into when they use an ORM. One of the things that I wanted to avoid was duplicating the possibility of error with RavenDB.

One of the major design decisions that I made was to disallow associations between documents. This is part of the core design of the system.

Let us take the following example:


As you can see, we have two documents, an Order and a Customer. The order references a customer, but unlike in RDBMS, we use a denormalized reference, with both the Id and the Name of the customer stored inside the Order document. That is advantageous because it allows us to perform most operations on the Order document without having to load the Customer document.

From the C# model, it looks like this:

public class Order
      public string Id { get;set; }
      public Address ShippingAddress { get;set; }
      public Address BillingAddress { get;set; }
      public DenormalizedReference Customer { get;set; }

public class DenormalizedReference
      public string Id { get;set; }
      public string Name { get;set; }

public class Customer
      public string Id { get;set; }
      public string Name { get;set; }
      public string Email { get;set; }


Note that there isn’t a direct reference between the Order and the Customer. Instead, Order holds a DenormalizedReference, which holds the interesting bits from Customer that we need to process requests on Order.

So far so good, but, and this is important, you can’t always set things up this way. There is a set of cases where you do want to be able to access the associated document.

Well, that is easy enough, isn’t it? All we need to do is load it:

var order = session.Load<Order>("orders/9432");
var customer = session.Load<Customer>(order.Customer.Id);

This is simple, easy to read, easy to understand and make me want to curl into a ball and weep. The problem is, of course, that this is going to generate two calls to Raven. And if there is one thing that I pay attention to is the number of remote calls that I am making.

I started to think about how I can make this scenario work better, and I came up with the following design.

Given the two documents

Then for GET /docs/orders/9432


And for GET /docs/orders/9432?include=Customer.Id


Note that in the second case, we get the full customer data merged into the order document.

From an implementation perspective, this would be very easy to do. The problem is how to represent this in the client API. We had a very interesting discussion on the topic in the mailing list.

Let me explain the problem in detail. Given the C# classes above, how do you express this notion of the include? You can’t use the Order model above, because that Customer property is going to of type DenormalizedReference. We can’t make that property of type Customer, either, because then the Customer data would be embedded inside the Order document, which isn’t what we wanted.

In the mailing list, there were a lot of proposal being raised, the one that seemed to be the most popular was to drop the 1:1 mapping between the C# model and the document model and move to something like this:

public class Order
    public string Id { get; set; }
    public string Name { get; set; }
    public Customer Customer { get; set; }

public class Customer
    public string Id { get; set; }
    public string Name { get; set; }
    public string Email { get; set; }


And then make the client API smart enough to understand the attribute. The model above would generate the same documents as the previous model, but would allow much easier time when working on features such as this. This way, we can normally access the data that is embedded in the document, but also include the associated document when we need it.

There are several problems here:

  • This creates a misleading API, making people think that things are normalized when they aren’t.
  • It is going to bring back ALL the problems associated with lazy loading (worse, it is going to bring back all the problems associated with EF 1.0 lazy loading).
  • It goes directly against the way I believe you should work with a document database.

But I couldn’t think of any other way, nor could anyone else.

Until Frank Schwieterman come to our rescue:

Maybe rather then join the documents into one result, such a request would cause the 'joined' entities to be preloaded instead.
From the client API perspective, I would load do the joined load of the user object, and the user is returned in its original form.  But now the session has the customer object preloaded, so when I try to load the customer object via the client API no request is made to the server.  From the caller's perspective, the only change to usage has been the preload hint passed in the original request.


The problem was (from my perspective) never with the way the model is structured, the problem was that the way the documents were model caused a performance problem. Frank’s suggestion completely eliminated that issue.

It took some interesting coding to get it to work properly, but essentially, it is just an application of the Future usage for loading large object graph in NHibernate. Now we can do:

var order = session

var customer = session.Load<Customer>(order.Customer.Id);

And this code will only go to the server once!

We get to keep the separate model, and we can manipulate how we are loading associations easily. I really like this solution.

More posts in "RavenDB" series:

  1. (25 May 2016) Got anything to declare, ya smuggler?
  2. (23 May 2016) I'm no longer conflicted about this
  3. (19 May 2016) What did you subscribe to again?
  4. (17 May 2016) See here, I got a contract, I say!
  5. (13 May 2016) Deeper insights to indexing
  6. (11 May 2016) Digging deep into the internals
  7. (09 May 2016) I'll have the 3+1 goodies to go, please
  8. (04 May 2016) I’ll find who is taking my I/O bandwidth and they SHALL pay
  9. (02 May 2016) You want all the data, you can’t handle all the data
  10. (29 Apr 2016) A large cluster goes into a bar and order N^2 drinks
  11. (27 Apr 2016) I’m the admin, and I got the POWER
  12. (25 Apr 2016) Can you spare me a server?
  13. (21 Apr 2016) Configuring once is best done after testing twice
  14. (19 Apr 2016) Is this a cluster in your pocket AND you are happy to see me?



Don't you mean "?include=Customer" instead of "?include=Company"?

Louis Hau&#223;knecht

I'm, a bit confused by the GET url: GET /docs/users/oren?include=Customer.Id

To retrieve the order with the customer included I'd use

GET /docs/orders/9432?include=Customer.Id

Thomas Eyde

Isn't there a way to get rid of the string argument in Include()? It looks funny compared to the typesafe Load <t().

Ayende Rahien


Yes, there is a Lambda option as well

Ayende Rahien


Damn, I have a lot of typos in this post, fixed, thanks.


Will this work for indexes as well? So I can do

var orders = session




and get all customers loaded for all orders.


Great feature, love it, thanks


What is an example of a case where you would need the email from the orders? Wouldn't you create a new document for that purpose? This seems like it might be encouraging bad design decisions.

Ayende Rahien


A common case would be if you need to notify the customer about delay in the order


I´m not a NoSql user yet, but I have a question. If the type of Customer object in Order class is a DenormalizedReference is is possible for one to expect getting the full customer when calling for order.Customer somehow?

I understood that you can pre-fetch the Customer and then access it through Session.Load but then we would need to 1) Keep a reference to the session object in Order instance to encapsulate the GetCustomer() method to return the full Customer or 2) Let the caller know that order.Customer will never be entire filled and if he wants it he will need to call Session.Load(order.Customer.Id).

I´m think of a way to both communicate the user that he has everything available when he needs and still don´t leak infrastructure aspects to the domain.

Ayende Rahien


In short, no. See the discussion on the model changes required to make this work, and why I don't like them.

The easy way to handle this is to use an ambient session, which is the general recommendation anyway.

And I like the fact that you need to take an extra step. You don't want to be able to reference stuff outside your own aggregate easily. See the discussion on Root Aggregates in DDD

Brian Vallelunga

Any chance we'll be able to include a collection of Ids? I think it might be useful in situations like this:

public class Customer {

public string Id { get; set; }

public string[] OrderIds { get; set; }


I'd want to be able to preload all orders here.

Ayende Rahien


This scenario just works

Charles Strahan

Very neat, Ayende. How would you support "including" a chain of references - such as Order.Customer.BestFriend.Etc?



Hi Ayende,

Good stuff! With reference to Brian Vallelunga's question, how does a query (to preload all orders) look?


Ayende Rahien


Exactly the same.


Ayende Rahien


I wouldn't support it, I can't think of an actual scenario where you need it in a document database.


"A common case would be if you need to notify the customer about delay in the order"

That sounds like a search screen where the rows would have the email in them already - ie a different document.

Ayende Rahien


That might be the case, yes, and it might also be the case that it is a problem with the way we model stuff.

But that is a features that a lot of people wanted


Thanks for accepting override patches.


first thing, changing the order of chaining methods would bring IMO __much better meaning of what you were trying to accomplish.

var order = session.Load <order("orders/1").Include("Customer.Id");

second, if all you're after is having to hit the DB just once when retrieving the Customer document for known Order.Id, why don't make API to have the Customer document loaded in one straight call?

var customer;

var order = session.Include("Customer.Id" __, out customer

).Load <order("orders/1");

for third, I would again change the ordering of methods so...

var customer;

var order = session.Load <order("orders/1").Include("Customer.Id", out customer);

just an idea, I know you're prove me wrong :)

Ayende Rahien


Regarding the method ordering.

Sure, I would like that.

Now make it work when you don't include stuff as weel.

var person = session.Load(Person)

var person = session.Load(Person).Include("Customer")

Regarding the out param, good idea


If you changed the API slightly could you have the order like cowgaR suggests.

var person=session.Get(Person).Load();

var person=session.Get(Person).Include("Customer").Load();

Ayende Rahien


That would make the default case (no includes) much uglier.

Daniel Steigerwald

I am not sure, if denormalized references are not actually premature optimization. Why should user care about minified models?

I know, it's stored twice, still...

Aggregate is class with Id

public class Order


public string Id { get; set; }

public string Name { get; set; }

public Customer Customer { get; set; }


public class Customer


public string Id { get; set; }

public string Name { get; set; }

public string Email { get; set; }


What's wrong with loading Order with whole Customer? I suppose it's explicit enough. It should work also for storing.

Instead of Hibernate default laziness, Raven can eager load everything.

Maybe it is stupid idea, but I would like to hear your opinion.

Daniel Steigerwald

I hope I understand all consequences related to sql joins, especially lazy evaluated. I am still newbie, but as I see it, raven document database is about eagerness (as opposite to sql laziness).

We have to create index before we can use it. Fine. We can put documents into db without scheme, super fine. So we should be able to load whole objects graphs in one step as well.

This code:

var order = session




Is equivalent to afore mentioned, in case laziness is forbidden.

Daniel Steigerwald

|This creates a misleading API, making people think that things are normalized when they aren’t.

It suppose it is implementation detail. Remember that object with id contained in another object with id is stored twice is easy.

|It is going to bring back ALL the problems associated with lazy loading (worse, it is going to bring back all the problems associated with EF 1.0 lazy loading).

So disallow lazy load at all. I don't need it anyway.

|It goes directly against the way I believe you should work with a document database.

What's wrong with hypertext documents? They are still documents.

Include is nice feature, but soon enough my code probably will be full of includes.

PS: Maybe I overlooked something (or everything) ;) It was just written brainstorming :)

Ayende Rahien


And Customer has a reference to Company, which has reference to Products, which has reference to...

In other words, you loaded the entire database.

It isn't premature, it is something that you have to deal with

Ayende Rahien


Yes, that is pretty much the point. Because you want to be able to control this for each scenario.

There is no one scenario that fit all

Ayende Rahien


You can't say it is an implementation detail, not when the impact is making remote calls.

And you can't disallow lazy loading, not when I consider this lazy loading as well:


Hypertext docs are great, but you only read ONE doc at a time.

With DocDB documents, you may want to access more than that


What about index based join?

Like this:


from doc in docs

where doc["@metadata"]["Raven-Entity-Name"] == "Products" || doc["@metadata"]["Raven-Entity-Name"] == "ProductInputs"

select new {

Code = doc["@metadata"]["Raven-Entity-Name"] == "Products" ? doc.Code : doc.ProductCode,

Input = doc["@metadata"]["Raven-Entity-Name"] == "Products" ? doc.FirstInput : doc.Input



from result in results

group result by result.Code into g

select new


Code = g.Key,

Count = g.Count(),

TotalInputs = g.Sum(x => x.Input ?? 0)


Ayende Rahien


While you can make this work, I am not quite sure what is the purpose. Especially in the context of includes.


That, maybe the includes and other denormalizations can be done by indexes.

Ayende Rahien

Why would this be beneficial?


It is faster and simpler than triggers and than non-intuitive queries like this:

var order = session.Include("Customer.Id").Load <order("orders/1");

(magic string, and how you now that you are loading a Customer?)

But Raven index syntax is not enough expressive. Doesn't it?

Sorry about my bad English.

Comment preview

Comments have been closed on this topic.


  1. The worker pattern - 3 days from now

There are posts all the way to May 30, 2016


  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats