API DesignSharding Status for failure scenarios–Solving at the right granularity
This post conclude this week’ series of API design choices regarding how to handle partial failure scenarios in sharded cluster. In my previous post, I discussed my issues with a local solution for the problem.
The way we ended up solving this issue is actually quite simple. We apply a global solution to a global problem, we added the ability to inject error handling logic deep into the execution pipeline of the sharding implementation, like this:
In this case, as you can see, we are allow requests to fail if we are querying (because we can probably still get something from other servers that will be useful), but if you are requesting something by id and it generates an error, we will propagate this error. Note that in our implementation, we call to a user defined “NotifyUserInterfaceAboutServerFailure”, which will let the user know about the error.
That way, you probably have some warning in the UI about partial information, but you are still functional. This is the proper way to handle this, because you are handling this once, and it means that you can handle it properly, instead of having to do the right thing everywhere.
More posts in "API Design" series:
- (04 Dec 2017) The lack of a method was intentional forethought
- (27 Jul 2016) robust error handling and recovery
- (20 Jul 2015) We’ll let the users sort it out
- (17 Jul 2015) Small modifications over a network
- (01 Jun 2012) Sharding Status for failure scenarios–Solving at the right granularity
- (31 May 2012) Sharding Status for failure scenarios–explicit failure management doesn’t work
- (30 May 2012) Sharding Status for failure scenarios–explicit failure management
- (29 May 2012) Sharding Status for failure scenarios–ignore and move on
- (28 May 2012) Sharding Status for failure scenarios
Comments
Oren, I agree that additional parameters for data retrieving methods like Load<> is bad way because pure abstraction of repository will be broken. At this stage user should not have possibility to manage/audit low-level logic. And solution described in this post possibly is better because ShardAccessStrategy is responsible point for sharding behavior definition. But ShardAccessStrategy belongs to ShardedDocumentStore which is not user action- (request-) specific as IDocumentSession. If we have a system using principle request/response we need to use some state related to DocumentStore that's no good. I suggest to make this session-related. 1. IDocumentStore.OpenSession(). Usual session fabric method. 2. IDocumentStore.OpenSession(ISessionErrorHandler). Additional fabric method. 3. ShardedDocumentStore{//try-catch on ShardAccessStrategy.Apply(commands) and if ISessionErrorHandler == null then throw;} I'm not sure that is the best solution.
Vlad, I don't see a reason to try to have different behaviors for different sessions, in most cases, you simply won't have any need for that, and when you do, you can handle that yourself.
Won't this result in the OnError handler growing out of proportion since in more complex usage scenarios it will have to be extended to analyse each different query to determine if it is allowed to continue? What about a scenario where the same query can be partial (for example, displaying posts to the user in a webpage) and cannot be partial (for example, the same posts but now for some sort of persistant media)? I'm all for global solutions but I always try to use that for defaults but allow the dev to override it where needed.
Also why do you designed the OnError with a return value instead of using EventArgs? What happens if two handlers return True and one - False?
Knaģis - I think you may have answered the first part of your question with the second. I presume the return value is meant to represent "handled". If there's one style of query that needs special handling, register it as an earlier event handler, and leave the "generic" handler as the last one subscribed. So "the" event handler doesn't grow uncontrollably, because you register multiple ones.
Of course, I might be completely misreading the intent of this design.
Oren, I suggest you change the signature of the OnError event to return an enum instead of a bool. This would be more self-documenting.
That way you'll return something like ErrorHandling.Handled or ErrorHandling.RethrowException.
What do you think?
Knagis, You are welcome to handle this in any way you see fit. You can introduce your own logic, build your own decisions, etc. And we only need a single event handler that returns true.
I'm curious. Why returning true/false instead of having an IgnoreError property on one of the parameters?
+1 to Omer's suggestion of using an enum. True/false is ambiguous here.
Beautiful solution.
Very elegant. My hat is off to you. Again. :P
Question: what if the developer doesn´t provide an error handler? Will he gets an exception?
Lucas, Then an exception is thrown.
Andreas, Note that you are free to implement local solutions, and you can do that, quite easily. You can give specific error messages, you can change behavior based on the current context, etc.
I like the option that the exception is thrown. To me that's way more elegant than a custom error handling delegate. If you squint just right, it pretty much looks like a try/catch block anyway.
Thanks for sharing this useful series. It was really interesting to see how such decisions are made up.
Comment preview