Still going on with the discussion on how to handle failures in a sharded cluster, we are back to the question of how to handle the scenario of one node in a cluster going down. The question is, what should be the system behavior in such a scenario.
In the previous post, we discussed the option of simply ignoring the failure, and the option of simply failing entirely. Both options are unpalatable, because we either transparently hide some data from the user (which reside on the failing node) or we take the entire system down when a single node is down.
Another option that was suggested in the mailing list is to actually expose this to the user, like so:
ShardingStatus status; va recentPosts = session.Query<Post>() .ShardingStatus( out status ) .OrderByDescending(x=>x.PublishedAt) .Take(20) .ToList();
This will give us the status information about potentially failing shards.
I intensely dislike this option, and I’ll discuss the reasons why on the next post. In the meantime, I would like to hear your opinion about this API choice.