Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 10 | Comments: 37

filter by tags archive

RavenDBSplitting entities across several documents

time to read 4 min | 684 words

There are occasions where it isn’t feasible or desirable to store our entity as a single document in RavenDB. A question that just came up was how to design votes for an entity using RavenDB.

The scenario is simple, we have our entity, Question (think stack overflow), which can have Up/Down votes. It would be very easy to design the system using a single document for the entity, like so:

{ //document id: questions/123
   Title: "How to handle Up/Down votes with Raven?",
   Content: "...",
   Votes: [
         { Up: true, User: "users/ayende" },
         { Up: false, User: "users/oren" },
  ]
}

As usual, the problem begins when you start to consider what happens when you want to deal with questions that may have large number of votes, or the common scenario where you just want to display the vote totals, and not pull the entire document to get that.

One option is to split things up. I guess you figured that out from the title of this blog post. The idea is to change the document structure to be:

{ //document id: questions/123
   Title: "How to handle Up/Down votes with Raven?",
   Content: "...",
}

{ //document id: questions/123/votes
   Votes: [
         { Up: true, User: "users/ayende" },
         { Up: false, User: "users/oren" },
  ]
}

Note that we have two separate documents here. Now we can load just the questions, or the questions and the votes. We still have a problem with getting the totals without loading potentially thousands of votes. It is pretty easy to solve this, however, using the following index:

from voteDoc in docs.VoteDocs
from vote in voteDoc.Votes
group vote by vote.Up into g
select new { Up = g.Key, Count = g.Count() }

Now we can query the index directly, to get the aggregated results:

session.LuceneQuery<VoteTotals>("Questions/VoteTotals")
            .SelectFields("__document_id", "Up", "Count")
            .ToList();

And if we want to get the votes themselves, they are easily available as well.

More posts in "RavenDB" series:

  1. (22 May 2015) Adding a new shard to an existing cluster, splitting the shard
  2. (21 May 2015) Adding a new shard to an existing cluster, the easy way
  3. (18 May 2015) Enabling shards for existing database

Comments

gandjustas

It looks like relational tables and join in view\function\storedproc, isn't it? ;)

Benny Thomas

Is it just me, or does your index use the unsplited document in this sample?

configurator

A feature I'd like to see is the indexes updating entities, i.e.

{ //document id: questions/123

Title: "How to handle Up/Down votes with Raven?",

Content: "...",

}

{ //document id: questions/123/votes

Votes: [

     { Up: true, User: "users/ayende" },

     { Up: false, User: "users/oren" },

]

}

Would be transformed automatically to

{ //document id: questions/123

Title: "How to handle Up/Down votes with Raven?",

Content: "...",

UpVotes: 1,

DownVotes: 1

}

{ //document id: questions/123/votes

Votes: [

     { Up: true, User: "users/ayende" },

     { Up: false, User: "users/oren" },

]

}

And the UpVotes/DownVotes would be updated whenever the index is. Do you have such a feature?

Torkel

Yes, I am also a little puzzled.

The index definition seems to use the original document where the Votes were part of the Question document. Was that the intent? Then why show the split?

/confused

Dennis

How will you deal with the "did I vote" on this without fetching the whole thing in?

fschwiet

I am surprised you put all the votes in one document still. In the same situation, I had stored each vote as a document then used a map/reduce index to count the totals.

How would you check if someone already voted? I suppose you can create an index to split out the individual votes, so they can be read one at a time. As people vote, you're going to have concurrency issues that you wouldn't have if the votes are individual documents.

Guy
Guy

How is that better / worse than storing the votes in the Question document and having an index which projects a question with only the total votes?

Demis Bellot

This actually looks quite inefficient. If I needed to implement this in Redis I would store the user ids in 2 server-side sets, one for 'up' and the other for 'down' votes.

Recording a vote can easily be done in a single 'SADD' set operation without needing to serialize/deserialize the entire document. By comparison this looks like it would be magnitudes of times slower.

gandjustas

SADD is unsuitable here. You need to store Who voted.

Demis Bellot

@gandjustas

SADD is unsuitable here. You need to store Who voted.

My recommendation was storing 'user ids', i.e. who's voted.

Using a set also ensures you only count each users vote once.

Ayende Rahien

Benny,

No, it uses the votes document, not the unsplitted.

Ayende Rahien

Configurator,

You can absolutely do that using an index update trigger!

You just have to watch out not to modify the same document that the index is based on (other wise you create a loop).

Ayende Rahien

Torkel,

No, it uses a separate document (//document id: questions/123/votes)

Both documents have a Votes array

Ayende Rahien

Dennis,

I would have an additional index, that would output who voted, and I could query that index.

Dennis

Ayende, Wouldnt that easily be a N+1 query?

"Show a list of posts, and then for every post I need to add a javascript tag to see if I already votes on it, so I can get instant feedback in the gui."

configurator

@Ayenda: Would it be a loop even if the index result doesn't change? Suppose I define the index B = 2 * A and put a document

{ A = 1 }

It would be changed at some point to be

{ A = 1, B = 2 }

And then when the index is rerun it would be changed to

{ A = 1, B = 2 }

as this is no change, the index doesn't have to be run again. Or does it?

Ayende Rahien

Dennis,

Actually, no.

You would simple make two queries.

a) get recent posts

b) get votes where post id is in (...) from the votes index

Ayende Rahien

Configurator,

Yes, it would be a look as long as the index update trigger would touch the same document.

We aren't comparing the old/new data when updating a document, and the etag is always updated.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Production postmortem: The case of the memory eater and high load - about one day from now
  2. Production postmortem: The case of the lying configuration file - 3 days from now
  3. Production postmortem: The industry at large - 4 days from now
  4. The insidious cost of allocations - 5 days from now
  5. Find the bug: The concurrent memory buster - 6 days from now

And 4 more posts are pending...

There are posts all the way to Sep 10, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    14 Aug 2015 - The case of the man in the middle
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats