Ayende @ Rahien

Refunds available at head office

Modeling hierarchical structures in RavenDB

The question pops up frequently enough and is interesting enough for a post. How do you store a data structure like this in Raven?

The problem here is that we don’t have enough information about the problem to actually give an answer. That is because when we think of how we should model the data, we also need to consider how it is going to be accessed. In more precise terms, we need to define what is the aggregate root of the data in question.

Let us take the following two examples:

image image

As you can imagine, a Person is an aggregate root. It can stand on its own. I would typically store a Person in Raven using one of two approaches:

Bare references Denormalized References
{
  "Name": "Ayende",
  "Email": "Ayende@ayende.com",
  "Parent": "people/18",
  "Children": [
        "people/59",
        "people/29"
  ]
}
{
  "Name": "Ayende",
  "Email": "Ayende@ayende.com",
  "Parent": { "Name": "Oren", "Id": "people/18"},
  "Children": [
        { "Name": "Raven", "Id": "people/59"},
        { "Name": "Rhino", "Id": "people/29"}
  ]
}

The first option is bare references, just holding the id of the associated document. This is useful if I only need to reference the data very rarely. If, however, (as is common), I need to also show some data from the associated documents, it is generally better to use denormalized references, which keep the data that we need to deal with from the associated document embedded inside the aggregate.

But the same approach wouldn’t work for Questions. In the Question model, we have utilized the same data structure to hold both the question and the answer. This sort of double utilization is pretty common, unfortunately. For example, you can see it being used in StackOverflow, where both Questions & Answers are stored as posts.

The problem from a design perspective is that in this case a Question is not a root aggregate in the same sense that a Person is. A Question is a root aggregate if it is an actual question, not if it is a Question instance that holds the answer to another question. I would model this using:

{
   "Content": "How to model relations in RavenDB?",
   "User": "users/1738",
   "Answers" : [
      {"Content": "You can use.. ", "User": "users/92" },
      {"Content": "Or you might...", "User": "users/94" },
   ]
}

In this case, we are embedding the children directly inside the root document.

So I am afraid that the answer to that question is: it depends.

Comments

Nathan Stott
06/23/2010 01:19 PM by
Nathan Stott

In CouchDB, you would not want to embed the answers to a question directly in the document because if two people answered the question at about the same time, or if you were using replication and they answered it between replication cycles, then you would get a 409 (conflict). If you add the answers as documents of their own, two people adding at the same time will not cause conflicts.

Would this scenario not be a problem with RavenDB? What about RavenDB makes the proper choice of strategy different?

Ayende Rahien
06/23/2010 01:38 PM by
Ayende Rahien

Nathan,

That is a good point. WRT replication, Raven would be in the same situation as CouchDB, but Raven also support the notion of partial updates, things like: "Add this answer to the Answers array"

Which means that two concurrent updates can both succeed.

Nathan Stott
06/23/2010 01:41 PM by
Nathan Stott

How do the partial updates work? Does the app have to specify that it is doing a partial update or does Raven do this behind the scenes? Got a link handy?

Brian Vallelunga
06/23/2010 02:30 PM by
Brian Vallelunga

I have a similar question to Nathan's. Given the StackOverflow model you presented, if two people answer the question at about the same time, won't you get conflicts storing the data to the db.

I can imagine the following scenario:

1) Person A answers question.

2) Get question document for Person A

3) Append answer A

4) Person B answers question.

5) Get question document for Person B

6) Save Person A's answer to DB.

7) Append answer B

8) Save Person B's answer to DB.

If we let the last-in win, Person A's answer is completely gone. I've actually avoided working a part of my application that requires this sort of modeling because I haven't figured out what to do yet.

Obviously storing the answers as entities themselves would help, but we'd almost always want to access the data as one document in this situation. Can you expand on a strategy here?

Thanks

Ayende Rahien
06/23/2010 02:33 PM by
Ayende Rahien

Brian,

As I told Nathan, the answer for that is to use Raven's partial document update support, which would resolve the issue

Jason Young
06/23/2010 03:13 PM by
Jason Young

Interesting!

So... for a limitlessly recursive heirarchy (e.g. parent-child relationship), you want each element in its own document, but for depth-limited relationships (e.g. question-answer), you can put all the "children" in a collection in the "parent" document, and "children" need not have documents of their own, correct? If so, that makes sense to me.

Brian Vallelunga
06/23/2010 03:19 PM by
Brian Vallelunga

Ahh, thanks, I see now. I read that as only being available with replication. Reading the mailing list, it seems there is client support at the store level for this. I haven't seen any examples of it though. I'll go ahead and ask on the list.

Ayende Rahien
06/23/2010 03:21 PM by
Ayende Rahien

Jason,

Yes...

Although I would put it differently

DavidChan
06/24/2010 03:01 AM by
DavidChan

seems like client api doesnot support the command "patch" ,right?

c# model
06/24/2010 04:39 AM by
c# model

maybe i'm missing something here but the Person denormalized example saves only id and name. When you query the model how does children and parent convert back to a whole c# person (with own parent and children) ?

Matt Warren
06/24/2010 10:16 AM by
Matt Warren

@c# model

You can use the id string and load the document based on that, i.e.

var person = session.Load <person("people/59");

Matt Warren
06/24/2010 10:18 AM by
Matt Warren

Just to add: Load is a generic method that need to have the type specified as "Person", but it got stripped out in my answer.

Daniel Cohen
06/25/2010 01:50 PM by
Daniel Cohen

@Matt warren , I get this if you go with the bare reference approach and then in you POCO class you have a string ParentId { get;set;}

but in the denormalized way what kind of class you get in return ?? it's not an id field nor a full Person class

btw "c# model" was intended to be the title not the name, a funny mistake :)

sebastien
06/28/2010 11:46 AM by
sebastien

What would be the cost of updating name in partialy denormalized reference ?

Ayende Rahien
06/28/2010 01:07 PM by
Ayende Rahien

Sebastian,

It shouldn't be very expensive.

Martin
07/14/2010 10:48 PM by
Martin

Is there a way to only load a small part of the Answers for paging (etc if there will be hundreds or thousands of them) soo the database wont have to send all of them ?

Martin
07/14/2010 10:55 PM by
Martin

... and what happens if a Username is stored for every Answer as it always needs to be displayed, but the user is allowed to change his Username ?

Will i have to loop through all documents in the database where the Username is stored (almost everywhere there is a user action), and update the Username? will it be a problem ?

Thanks for a great blog.

Ayende Rahien
07/20/2010 11:18 AM by
Ayende Rahien

Martin,

Yes, you can.

You create an index that project those out, and then query on that

Ayende Rahien
07/20/2010 11:18 AM by
Ayende Rahien

Martin,

Changing username is a rare occasion, you can handle that as a background process

Comments have been closed on this topic.