RavenDBSplitting entities across several documents
There are occasions where it isn’t feasible or desirable to store our entity as a single document in RavenDB. A question that just came up was how to design votes for an entity using RavenDB.
The scenario is simple, we have our entity, Question (think stack overflow), which can have Up/Down votes. It would be very easy to design the system using a single document for the entity, like so:
{ //document id: questions/123 Title: "How to handle Up/Down votes with Raven?", Content: "...", Votes: [ { Up: true, User: "users/ayende" }, { Up: false, User: "users/oren" }, ] }
As usual, the problem begins when you start to consider what happens when you want to deal with questions that may have large number of votes, or the common scenario where you just want to display the vote totals, and not pull the entire document to get that.
One option is to split things up. I guess you figured that out from the title of this blog post. The idea is to change the document structure to be:
{ //document id: questions/123 Title: "How to handle Up/Down votes with Raven?", Content: "...", } { //document id: questions/123/votes Votes: [ { Up: true, User: "users/ayende" }, { Up: false, User: "users/oren" }, ] }
Note that we have two separate documents here. Now we can load just the questions, or the questions and the votes. We still have a problem with getting the totals without loading potentially thousands of votes. It is pretty easy to solve this, however, using the following index:
from voteDoc in docs.VoteDocs from vote in voteDoc.Votes group vote by vote.Up into g select new { Up = g.Key, Count = g.Count() }
Now we can query the index directly, to get the aggregated results:
session.LuceneQuery<VoteTotals>("Questions/VoteTotals") .SelectFields("__document_id", "Up", "Count") .ToList();
And if we want to get the votes themselves, they are easily available as well.
More posts in "RavenDB" series:
- (17 Feb 2025) Shared Journals
- (14 Feb 2025) Reclaiming disk space
- (12 Feb 2025) Write modes
- (10 Feb 2025) Next-Gen Pagers
Comments
It looks like relational tables and join in view\function\storedproc, isn't it? ;)
Is it just me, or does your index use the unsplited document in this sample?
A feature I'd like to see is the indexes updating entities, i.e.
{ //document id: questions/123
Title: "How to handle Up/Down votes with Raven?",
Content: "...",
}
{ //document id: questions/123/votes
Votes: [
]
}
Would be transformed automatically to
{ //document id: questions/123
Title: "How to handle Up/Down votes with Raven?",
Content: "...",
UpVotes: 1,
DownVotes: 1
}
{ //document id: questions/123/votes
Votes: [
]
}
And the UpVotes/DownVotes would be updated whenever the index is. Do you have such a feature?
Yes, I am also a little puzzled.
The index definition seems to use the original document where the Votes were part of the Question document. Was that the intent? Then why show the split?
/confused
How will you deal with the "did I vote" on this without fetching the whole thing in?
I am surprised you put all the votes in one document still. In the same situation, I had stored each vote as a document then used a map/reduce index to count the totals.
How would you check if someone already voted? I suppose you can create an index to split out the individual votes, so they can be read one at a time. As people vote, you're going to have concurrency issues that you wouldn't have if the votes are individual documents.
How is that better / worse than storing the votes in the Question document and having an index which projects a question with only the total votes?
This actually looks quite inefficient. If I needed to implement this in Redis I would store the user ids in 2 server-side sets, one for 'up' and the other for 'down' votes.
Recording a vote can easily be done in a single 'SADD' set operation without needing to serialize/deserialize the entire document. By comparison this looks like it would be magnitudes of times slower.
SADD is unsuitable here. You need to store Who voted.
@gandjustas
My recommendation was storing 'user ids', i.e. who's voted.
Using a set also ensures you only count each users vote once.
Benny,
No, it uses the votes document, not the unsplitted.
Configurator,
You can absolutely do that using an index update trigger!
You just have to watch out not to modify the same document that the index is based on (other wise you create a loop).
Torkel,
No, it uses a separate document (//document id: questions/123/votes)
Both documents have a Votes array
Dennis,
I would have an additional index, that would output who voted, and I could query that index.
Ayende, Wouldnt that easily be a N+1 query?
"Show a list of posts, and then for every post I need to add a javascript tag to see if I already votes on it, so I can get instant feedback in the gui."
@Ayenda: Would it be a loop even if the index result doesn't change? Suppose I define the index B = 2 * A and put a document
{ A = 1 }
It would be changed at some point to be
{ A = 1, B = 2 }
And then when the index is rerun it would be changed to
{ A = 1, B = 2 }
as this is no change, the index doesn't have to be run again. Or does it?
Dennis,
Actually, no.
You would simple make two queries.
a) get recent posts
b) get votes where post id is in (...) from the votes index
Configurator,
Yes, it would be a look as long as the index update trigger would touch the same document.
We aren't comparing the old/new data when updating a document, and the etag is always updated.
Comment preview