Ayende @ Rahien

It's a girl

API Design: Sharding Status for failure scenarios–Solving at the right granularity

This post conclude this week’ series of API design choices regarding how to handle partial failure scenarios in sharded cluster. In my previous post,  I discussed my issues with a local solution for the problem.

The way we ended up solving this issue is actually quite simple. We apply a global solution to a global problem, we added the ability to inject error handling logic deep into the execution pipeline of the sharding implementation, like this:

image

In this case, as you can see, we are allow requests to fail if we are querying (because we can probably still get something from other servers that will be useful), but if you are requesting something by id and it generates an error, we will propagate this error. Note that in our implementation, we call to a user defined “NotifyUserInterfaceAboutServerFailure”, which will let the user know about the error.

That way, you probably have some warning in the UI about partial information, but you are still functional. This is the proper way to handle this, because you are handling this once, and it means that you can handle it properly, instead of having to do the right thing everywhere.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Vlad
04/09/2012 04:15 PM by
Vlad

Oren, I agree that additional parameters for data retrieving methods like Load<> is bad way because pure abstraction of repository will be broken. At this stage user should not have possibility to manage/audit low-level logic. And solution described in this post possibly is better because ShardAccessStrategy is responsible point for sharding behavior definition. But ShardAccessStrategy belongs to ShardedDocumentStore which is not user action- (request-) specific as IDocumentSession. If we have a system using principle request/response we need to use some state related to DocumentStore that's no good. I suggest to make this session-related. 1. IDocumentStore.OpenSession(). Usual session fabric method. 2. IDocumentStore.OpenSession(ISessionErrorHandler). Additional fabric method. 3. ShardedDocumentStore{//try-catch on ShardAccessStrategy.Apply(commands) and if ISessionErrorHandler == null then throw;} I'm not sure that is the best solution.

Ayende Rahien
04/09/2012 06:20 PM by
Ayende Rahien

Vlad, I don't see a reason to try to have different behaviors for different sessions, in most cases, you simply won't have any need for that, and when you do, you can handle that yourself.

Knaģis
06/01/2012 10:19 AM by
Knaģis

Won't this result in the OnError handler growing out of proportion since in more complex usage scenarios it will have to be extended to analyse each different query to determine if it is allowed to continue? What about a scenario where the same query can be partial (for example, displaying posts to the user in a webpage) and cannot be partial (for example, the same posts but now for some sort of persistant media)? I'm all for global solutions but I always try to use that for defaults but allow the dev to override it where needed.

Also why do you designed the OnError with a return value instead of using EventArgs? What happens if two handlers return True and one - False?

Damien
06/01/2012 10:24 AM by
Damien

Knaģis - I think you may have answered the first part of your question with the second. I presume the return value is meant to represent "handled". If there's one style of query that needs special handling, register it as an earlier event handler, and leave the "generic" handler as the last one subscribed. So "the" event handler doesn't grow uncontrollably, because you register multiple ones.

Of course, I might be completely misreading the intent of this design.

Omer Mor
06/01/2012 11:09 AM by
Omer Mor

Oren, I suggest you change the signature of the OnError event to return an enum instead of a bool. This would be more self-documenting.

That way you'll return something like ErrorHandling.Handled or ErrorHandling.RethrowException.

What do you think?

Ayende Rahien
06/01/2012 11:11 AM by
Ayende Rahien

Knagis, You are welcome to handle this in any way you see fit. You can introduce your own logic, build your own decisions, etc. And we only need a single event handler that returns true.

Daniel
06/01/2012 11:24 AM by
Daniel

I'm curious. Why returning true/false instead of having an IgnoreError property on one of the parameters?

Paul Stovell
06/01/2012 03:16 PM by
Paul Stovell

+1 to Omer's suggestion of using an enum. True/false is ambiguous here.

Paulo Köch
06/01/2012 04:12 PM by
Paulo Köch

Very elegant. My hat is off to you. Again. :P

Lucas Ontivero
06/02/2012 01:19 AM by
Lucas Ontivero

Question: what if the developer doesn´t provide an error handler? Will he gets an exception?

Ayende Rahien
06/02/2012 10:56 AM by
Ayende Rahien

Lucas, Then an exception is thrown.

Ayende Rahien
06/02/2012 10:57 AM by
Ayende Rahien

Andreas, Note that you are free to implement local solutions, and you can do that, quite easily. You can give specific error messages, you can change behavior based on the current context, etc.

Alex Beynenson
06/04/2012 03:30 PM by
Alex Beynenson

I like the option that the exception is thrown. To me that's way more elegant than a custom error handling delegate. If you squint just right, it pretty much looks like a try/catch block anyway.

Daniel Lang
06/05/2012 09:36 PM by
Daniel Lang

Thanks for sharing this useful series. It was really interesting to see how such decisions are made up.

Comments have been closed on this topic.