Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,124 | Comments: 45,470

filter by tags archive

API DesignSharding Status for failure scenarios–Solving at the right granularity

time to read 2 min | 225 words

This post conclude this week’ series of API design choices regarding how to handle partial failure scenarios in sharded cluster. In my previous post,  I discussed my issues with a local solution for the problem.

The way we ended up solving this issue is actually quite simple. We apply a global solution to a global problem, we added the ability to inject error handling logic deep into the execution pipeline of the sharding implementation, like this:


In this case, as you can see, we are allow requests to fail if we are querying (because we can probably still get something from other servers that will be useful), but if you are requesting something by id and it generates an error, we will propagate this error. Note that in our implementation, we call to a user defined “NotifyUserInterfaceAboutServerFailure”, which will let the user know about the error.

That way, you probably have some warning in the UI about partial information, but you are still functional. This is the proper way to handle this, because you are handling this once, and it means that you can handle it properly, instead of having to do the right thing everywhere.

More posts in "API Design" series:

  1. (20 Jul 2015) We’ll let the users sort it out
  2. (17 Jul 2015) Small modifications over a network
  3. (01 Jun 2012) Sharding Status for failure scenarios–Solving at the right granularity
  4. (31 May 2012) Sharding Status for failure scenarios–explicit failure management doesn’t work
  5. (30 May 2012) Sharding Status for failure scenarios–explicit failure management
  6. (29 May 2012) Sharding Status for failure scenarios–ignore and move on
  7. (28 May 2012) Sharding Status for failure scenarios



Oren, I agree that additional parameters for data retrieving methods like Load<> is bad way because pure abstraction of repository will be broken. At this stage user should not have possibility to manage/audit low-level logic. And solution described in this post possibly is better because ShardAccessStrategy is responsible point for sharding behavior definition. But ShardAccessStrategy belongs to ShardedDocumentStore which is not user action- (request-) specific as IDocumentSession. If we have a system using principle request/response we need to use some state related to DocumentStore that's no good. I suggest to make this session-related. 1. IDocumentStore.OpenSession(). Usual session fabric method. 2. IDocumentStore.OpenSession(ISessionErrorHandler). Additional fabric method. 3. ShardedDocumentStore{//try-catch on ShardAccessStrategy.Apply(commands) and if ISessionErrorHandler == null then throw;} I'm not sure that is the best solution.

Ayende Rahien

Vlad, I don't see a reason to try to have different behaviors for different sessions, in most cases, you simply won't have any need for that, and when you do, you can handle that yourself.


Won't this result in the OnError handler growing out of proportion since in more complex usage scenarios it will have to be extended to analyse each different query to determine if it is allowed to continue? What about a scenario where the same query can be partial (for example, displaying posts to the user in a webpage) and cannot be partial (for example, the same posts but now for some sort of persistant media)? I'm all for global solutions but I always try to use that for defaults but allow the dev to override it where needed.

Also why do you designed the OnError with a return value instead of using EventArgs? What happens if two handlers return True and one - False?


Knaģis - I think you may have answered the first part of your question with the second. I presume the return value is meant to represent "handled". If there's one style of query that needs special handling, register it as an earlier event handler, and leave the "generic" handler as the last one subscribed. So "the" event handler doesn't grow uncontrollably, because you register multiple ones.

Of course, I might be completely misreading the intent of this design.

Omer Mor

Oren, I suggest you change the signature of the OnError event to return an enum instead of a bool. This would be more self-documenting.

That way you'll return something like ErrorHandling.Handled or ErrorHandling.RethrowException.

What do you think?

Ayende Rahien

Knagis, You are welcome to handle this in any way you see fit. You can introduce your own logic, build your own decisions, etc. And we only need a single event handler that returns true.


I'm curious. Why returning true/false instead of having an IgnoreError property on one of the parameters?

Paul Stovell

+1 to Omer's suggestion of using an enum. True/false is ambiguous here.

Paulo Köch

Very elegant. My hat is off to you. Again. :P

Lucas Ontivero

Question: what if the developer doesn´t provide an error handler? Will he gets an exception?

Ayende Rahien

Lucas, Then an exception is thrown.

Ayende Rahien

Andreas, Note that you are free to implement local solutions, and you can do that, quite easily. You can give specific error messages, you can change behavior based on the current context, etc.

Alex Beynenson

I like the option that the exception is thrown. To me that's way more elegant than a custom error handling delegate. If you squint just right, it pretty much looks like a try/catch block anyway.

Daniel Lang

Thanks for sharing this useful series. It was really interesting to see how such decisions are made up.

Comment preview

Comments have been closed on this topic.


  1. RavenDB 3.5 whirl wind tour: You want all the data, you can’t handle all the data - 2 days from now
  2. The design of RavenDB 4.0: Making Lucene reliable - 3 days from now
  3. RavenDB 3.5 whirl wind tour: I’ll find who is taking my I/O bandwidth and they SHALL pay - 4 days from now
  4. The design of RavenDB 4.0: Physically segregating collections - 5 days from now
  5. RavenDB 3.5 Whirlwind tour: I need to be free to explore my data - 6 days from now

And 14 more posts are pending...

There are posts all the way to May 30, 2016


  1. RavenDB 3.5 whirl wind tour (14):
    29 Apr 2016 - A large cluster goes into a bar and order N^2 drinks
  2. The design of RavenDB 4.0 (13):
    28 Apr 2016 - The implications of the blittable format
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats