Ayende @ Rahien

It's a girl

RavenDB: Splitting entities across several documents

There are occasions where it isn’t feasible or desirable to store our entity as a single document in RavenDB. A question that just came up was how to design votes for an entity using RavenDB.

The scenario is simple, we have our entity, Question (think stack overflow), which can have Up/Down votes. It would be very easy to design the system using a single document for the entity, like so:

{ //document id: questions/123
   Title: "How to handle Up/Down votes with Raven?",
   Content: "...",
   Votes: [
         { Up: true, User: "users/ayende" },
         { Up: false, User: "users/oren" },
  ]
}

As usual, the problem begins when you start to consider what happens when you want to deal with questions that may have large number of votes, or the common scenario where you just want to display the vote totals, and not pull the entire document to get that.

One option is to split things up. I guess you figured that out from the title of this blog post. The idea is to change the document structure to be:

{ //document id: questions/123
   Title: "How to handle Up/Down votes with Raven?",
   Content: "...",
}

{ //document id: questions/123/votes
   Votes: [
         { Up: true, User: "users/ayende" },
         { Up: false, User: "users/oren" },
  ]
}

Note that we have two separate documents here. Now we can load just the questions, or the questions and the votes. We still have a problem with getting the totals without loading potentially thousands of votes. It is pretty easy to solve this, however, using the following index:

from voteDoc in docs.VoteDocs
from vote in voteDoc.Votes
group vote by vote.Up into g
select new { Up = g.Key, Count = g.Count() }

Now we can query the index directly, to get the aggregated results:

session.LuceneQuery<VoteTotals>("Questions/VoteTotals")
            .SelectFields("__document_id", "Up", "Count")
            .ToList();

And if we want to get the votes themselves, they are easily available as well.

Comments

gandjustas
09/29/2010 10:42 AM by
gandjustas

It looks like relational tables and join in view\function\storedproc, isn't it? ;)

Benny Thomas
09/29/2010 10:43 AM by
Benny Thomas

Is it just me, or does your index use the unsplited document in this sample?

configurator
09/29/2010 11:33 AM by
configurator

A feature I'd like to see is the indexes updating entities, i.e.

{ //document id: questions/123

Title: "How to handle Up/Down votes with Raven?",

Content: "...",

}

{ //document id: questions/123/votes

Votes: [

     { Up: true, User: "users/ayende" },

     { Up: false, User: "users/oren" },

]

}

Would be transformed automatically to

{ //document id: questions/123

Title: "How to handle Up/Down votes with Raven?",

Content: "...",

UpVotes: 1,

DownVotes: 1

}

{ //document id: questions/123/votes

Votes: [

     { Up: true, User: "users/ayende" },

     { Up: false, User: "users/oren" },

]

}

And the UpVotes/DownVotes would be updated whenever the index is. Do you have such a feature?

Torkel
09/29/2010 12:28 PM by
Torkel

Yes, I am also a little puzzled.

The index definition seems to use the original document where the Votes were part of the Question document. Was that the intent? Then why show the split?

/confused

Dennis
09/29/2010 12:41 PM by
Dennis

How will you deal with the "did I vote" on this without fetching the whole thing in?

fschwiet
09/29/2010 04:29 PM by
fschwiet

I am surprised you put all the votes in one document still. In the same situation, I had stored each vote as a document then used a map/reduce index to count the totals.

How would you check if someone already voted? I suppose you can create an index to split out the individual votes, so they can be read one at a time. As people vote, you're going to have concurrency issues that you wouldn't have if the votes are individual documents.

Guy
09/29/2010 06:01 PM by
Guy

How is that better / worse than storing the votes in the Question document and having an index which projects a question with only the total votes?

Demis Bellot
09/29/2010 11:56 PM by
Demis Bellot

This actually looks quite inefficient. If I needed to implement this in Redis I would store the user ids in 2 server-side sets, one for 'up' and the other for 'down' votes.

Recording a vote can easily be done in a single 'SADD' set operation without needing to serialize/deserialize the entire document. By comparison this looks like it would be magnitudes of times slower.

gandjustas
09/30/2010 06:34 AM by
gandjustas

SADD is unsuitable here. You need to store Who voted.

Demis Bellot
09/30/2010 07:28 AM by
Demis Bellot

@gandjustas

SADD is unsuitable here. You need to store Who voted.

My recommendation was storing 'user ids', i.e. who's voted.

Using a set also ensures you only count each users vote once.

Ayende Rahien
10/01/2010 11:23 AM by
Ayende Rahien

Benny,

No, it uses the votes document, not the unsplitted.

Ayende Rahien
10/01/2010 11:24 AM by
Ayende Rahien

Configurator,

You can absolutely do that using an index update trigger!

You just have to watch out not to modify the same document that the index is based on (other wise you create a loop).

Ayende Rahien
10/01/2010 11:25 AM by
Ayende Rahien

Torkel,

No, it uses a separate document (//document id: questions/123/votes)

Both documents have a Votes array

Ayende Rahien
10/01/2010 11:25 AM by
Ayende Rahien

Dennis,

I would have an additional index, that would output who voted, and I could query that index.

Dennis
10/01/2010 01:38 PM by
Dennis

Ayende, Wouldnt that easily be a N+1 query?

"Show a list of posts, and then for every post I need to add a javascript tag to see if I already votes on it, so I can get instant feedback in the gui."

configurator
10/01/2010 01:55 PM by
configurator

@Ayenda: Would it be a loop even if the index result doesn't change? Suppose I define the index B = 2 * A and put a document

{ A = 1 }

It would be changed at some point to be

{ A = 1, B = 2 }

And then when the index is rerun it would be changed to

{ A = 1, B = 2 }

as this is no change, the index doesn't have to be run again. Or does it?

Ayende Rahien
10/03/2010 12:35 PM by
Ayende Rahien

Dennis,

Actually, no.

You would simple make two queries.

a) get recent posts

b) get votes where post id is in (...) from the votes index

Ayende Rahien
10/03/2010 12:37 PM by
Ayende Rahien

Configurator,

Yes, it would be a look as long as the index update trigger would touch the same document.

We aren't comparing the old/new data when updating a document, and the etag is always updated.

Comments have been closed on this topic.