Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,128 | Comments: 45,549

filter by tags archive

API DesignSharding Status for failure scenarios

time to read 1 min | 187 words

An interesting question came up recently. How do we want to handle sharding failures?

For example, let us say that I have a 3 nodes clusters of RavenDB, serving posts for a blog (just to give some random example). The way the sharding has been setup, we are doing sharding using Round Robin based on posts (so each post goes to a different machine, and anything related to post goes to the same node as the post). Here is how it can be set:


Now, we want to display the main page, and we would like to show the most recent posts. We can do this using the following code:


The question is, what would happen if the second server if offline?

I’ll give several alternative in the next few posts.

More posts in "API Design" series:

  1. (20 Jul 2015) We’ll let the users sort it out
  2. (17 Jul 2015) Small modifications over a network
  3. (01 Jun 2012) Sharding Status for failure scenarios–Solving at the right granularity
  4. (31 May 2012) Sharding Status for failure scenarios–explicit failure management doesn’t work
  5. (30 May 2012) Sharding Status for failure scenarios–explicit failure management
  6. (29 May 2012) Sharding Status for failure scenarios–ignore and move on
  7. (28 May 2012) Sharding Status for failure scenarios


Phillip Haydon

:( Sometimes I wish your blog posts weren't broken up into 15 posts, by the time it get's to the good stuff I've forgotten the rest and/or lost interest.

Simon Hughes

This should be configurable IMHO. Either: * Return what you can for the above select, or * Return nothing. There should be some sort of DB and Shard monitor that checks to make sure its online and working ok, and warn admins if its off-line via SMS/Email/etc.

Paul Stovell

@Simon, since RavenDB uses HTTP you can probably achieve that monitoring using any HTTP monitoring solution.

Christopher Wright

If a third of the data isn't available, you return the top 20 results of what you still have.

In order to implement a query like this with an unknown shard strategy, each server needs to return 20 results. If you try getting clever, you either need to be really clever (and probably lose out in efficiency in the end, unless you precompute appreciably), or you end up returning stuff in the wrong order sometimes.

If you're using a shard strategy which lines up with your query ordering, and have been since the beginning of time with the same number of nodes (or you rebalanced recently), you can actually query ceil(Take / #nodes) from each node. This doesn't come into play here in any case; PublishedAt is not related to insertion order. It also doesn't matter much if you have a small number of not terribly huge documents -- though here, if you have a very popular blog, you might get several hundred comments per post, so it might be worthwhile.

Andrew Armstrong

I'd imagine for some results you could do with a missing shard server (eg, no specific order or limit), however I assume you may need to indicate to RavenDB that inconsistent results are permitted.

Steve Py

To throw something out there: I'm assuming that session.Query() returns IEnumerable. Extend IEnumerable into something like IIncompleteEnumerable which includes a Reason and/or Exception. Consumers should use methods to consume the results by either IIncompleteEnumerable or IEnumerable with the former handing off to the later once it takes action with the reason. (prompt the user, fire off an e-mail, what-have-you)

Unfortunately I believe you will always run into the case where such a scenario will either be "in your face" in the sense of blocking the application, or easily ignored by accident. Can you have your cake and eat it too?

Ayende Rahien

Steve, Now what happens when you do a Load instead of Query?

Steve Py

Not sure I follow, but if this is a "how do I handle multiple ways to skin a cat" then my response would be "pick one". if you can do the same kind of thing with Load as with Query then the design should standardize on using one or the other.

If Load is used for more direct calls to retrieve single specific records by key, then that's a different scenario and if Session decides it needs an offline shard for the record you want, then feed the caller an exception.

Query results: Here you go, but know that some of the data was not available. Load a record: Sorry, that data is not available.

Comment preview

Comments have been closed on this topic.


  1. The worker pattern - 3 days from now

There are posts all the way to May 30, 2016


  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats