Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

@

Posts: 5,947 | Comments: 44,541

filter by tags archive

Modeling hierarchical structures in RavenDB


The question pops up frequently enough and is interesting enough for a post. How do you store a data structure like this in Raven?

The problem here is that we don’t have enough information about the problem to actually give an answer. That is because when we think of how we should model the data, we also need to consider how it is going to be accessed. In more precise terms, we need to define what is the aggregate root of the data in question.

Let us take the following two examples:

image image

As you can imagine, a Person is an aggregate root. It can stand on its own. I would typically store a Person in Raven using one of two approaches:

Bare references Denormalized References
{
  "Name": "Ayende",
  "Email": "Ayende@ayende.com",
  "Parent": "people/18",
  "Children": [
        "people/59",
        "people/29"
  ]
}
{
  "Name": "Ayende",
  "Email": "Ayende@ayende.com",
  "Parent": { "Name": "Oren", "Id": "people/18"},
  "Children": [
        { "Name": "Raven", "Id": "people/59"},
        { "Name": "Rhino", "Id": "people/29"}
  ]
}

The first option is bare references, just holding the id of the associated document. This is useful if I only need to reference the data very rarely. If, however, (as is common), I need to also show some data from the associated documents, it is generally better to use denormalized references, which keep the data that we need to deal with from the associated document embedded inside the aggregate.

But the same approach wouldn’t work for Questions. In the Question model, we have utilized the same data structure to hold both the question and the answer. This sort of double utilization is pretty common, unfortunately. For example, you can see it being used in StackOverflow, where both Questions & Answers are stored as posts.

The problem from a design perspective is that in this case a Question is not a root aggregate in the same sense that a Person is. A Question is a root aggregate if it is an actual question, not if it is a Question instance that holds the answer to another question. I would model this using:

{
   "Content": "How to model relations in RavenDB?",
   "User": "users/1738",
   "Answers" : [
      {"Content": "You can use.. ", "User": "users/92" },
      {"Content": "Or you might...", "User": "users/94" },
   ]
}

In this case, we are embedding the children directly inside the root document.

So I am afraid that the answer to that question is: it depends.


Comments

Nathan Stott

In CouchDB, you would not want to embed the answers to a question directly in the document because if two people answered the question at about the same time, or if you were using replication and they answered it between replication cycles, then you would get a 409 (conflict). If you add the answers as documents of their own, two people adding at the same time will not cause conflicts.

Would this scenario not be a problem with RavenDB? What about RavenDB makes the proper choice of strategy different?

Ayende Rahien

Nathan,

That is a good point. WRT replication, Raven would be in the same situation as CouchDB, but Raven also support the notion of partial updates, things like: "Add this answer to the Answers array"

Which means that two concurrent updates can both succeed.

Nathan Stott

How do the partial updates work? Does the app have to specify that it is doing a partial update or does Raven do this behind the scenes? Got a link handy?

Brian Vallelunga

I have a similar question to Nathan's. Given the StackOverflow model you presented, if two people answer the question at about the same time, won't you get conflicts storing the data to the db.

I can imagine the following scenario:

1) Person A answers question.

2) Get question document for Person A

3) Append answer A

4) Person B answers question.

5) Get question document for Person B

6) Save Person A's answer to DB.

7) Append answer B

8) Save Person B's answer to DB.

If we let the last-in win, Person A's answer is completely gone. I've actually avoided working a part of my application that requires this sort of modeling because I haven't figured out what to do yet.

Obviously storing the answers as entities themselves would help, but we'd almost always want to access the data as one document in this situation. Can you expand on a strategy here?

Thanks

Ayende Rahien

Brian,

As I told Nathan, the answer for that is to use Raven's partial document update support, which would resolve the issue

Jason Young

Interesting!

So... for a limitlessly recursive heirarchy (e.g. parent-child relationship), you want each element in its own document, but for depth-limited relationships (e.g. question-answer), you can put all the "children" in a collection in the "parent" document, and "children" need not have documents of their own, correct? If so, that makes sense to me.

Brian Vallelunga

Ahh, thanks, I see now. I read that as only being available with replication. Reading the mailing list, it seems there is client support at the store level for this. I haven't seen any examples of it though. I'll go ahead and ask on the list.

Ayende Rahien

Jason,

Yes...

Although I would put it differently

DavidChan

seems like client api doesnot support the command "patch" ,right?

c# model

maybe i'm missing something here but the Person denormalized example saves only id and name. When you query the model how does children and parent convert back to a whole c# person (with own parent and children) ?

Matt Warren

@c# model

You can use the id string and load the document based on that, i.e.

var person = session.Load <person("people/59");

Matt Warren

Just to add: Load is a generic method that need to have the type specified as "Person", but it got stripped out in my answer.

Daniel Cohen

@Matt warren , I get this if you go with the bare reference approach and then in you POCO class you have a string ParentId { get;set;}

but in the denormalized way what kind of class you get in return ?? it's not an id field nor a full Person class

btw "c# model" was intended to be the title not the name, a funny mistake :)

sebastien

What would be the cost of updating name in partialy denormalized reference ?

Ayende Rahien

Sebastian,

It shouldn't be very expensive.

Martin

Is there a way to only load a small part of the Answers for paging (etc if there will be hundreds or thousands of them) soo the database wont have to send all of them ?

Martin

... and what happens if a Username is stored for every Answer as it always needs to be displayed, but the user is allowed to change his Username ?

Will i have to loop through all documents in the database where the Username is stored (almost everywhere there is a user action), and update the Username? will it be a problem ?

Thanks for a great blog.

Ayende Rahien

Martin,

Yes, you can.

You create an index that project those out, and then query on that

Ayende Rahien

Martin,

Changing username is a rare occasion, you can handle that as a background process

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  2. The RavenDB Comic Strip (2):
    20 May 2015 - Part II – a team in trouble!
  3. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  4. Interview question (2):
    30 Mar 2015 - fix the index
  5. Excerpts from the RavenDB Performance team report (20):
    20 Feb 2015 - Optimizing Compare – The circle of life (a post-mortem)
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats