Ayende @ Rahien

Refunds available at head office

API Design: Sharding Status for failure scenarios–explicit failure management doesn’t work

Still going on with the discussion on how to handle failures in a sharded cluster, we are back to the question of how to handle the scenario of one node in a cluster going down. The question is, what should be the system behavior in such a scenario.

In my previous post, I discussed one alternative option:

ShardingStatus status;
va recentPosts = session.Query<Post>()
          .ShardingStatus( out status )

I said that I really don’t like this option. But deferred the discussion on exactly why.

Basically, the entire problem boils down to a very simple fact, manual memory management doesn’t work.

Huh? What is the relation between handling failures in a cluster to manual memory management? Oren, did you get your wires crossed again and continued a different blog post altogether?

Well, no. It is the same basic principle. Requiring users to add a specific handler for this result in several scenarios, none of them ideal.

First, what happen if we don’t specify this? We are back to the “ignore & swallow the error” or “throw and kill the entire system”.

Let us assume that we go with the first option, the developer has a way to get the error if they want it, but if they don’t care, we will just ignore this error. The problem with this approach is that it is entirely certain that developers will not add this, at least, not before the first time we have a node fail in production and the system will simply ignore this and show the wrong results.

The other option, throw an exception if the user didn’t ask for the sharding status and we have a failing node, is arguably worse. We now have a ticking time bomb. If a node goes down, the entire system will go down. The reason that I say that this is worse than the previous option is that the natural inclination of most developers is to simply stick the ShardingStatus() there and “fix” the problem. Of course, this is basically the same as the first option, but this time, the API actually let the user down the wrong path.

Second, this is forcing a local solution on a global problem. We are trying to force the user to handle errors at a place where the only thing that they care about is the actual business problem.

Third, this alternative doesn’t handle scenarios where we are doing other things, like loading by id. How would you get the ShardingStatus from this call?


Anything that you come up with is likely to introduce additional complexity and make things much harder to work with.

As I said, I intensely dislike this option. A much better alternative exists, and I’ll discuss this in the next post…


Posted By: Ayende Rahien

Published at

Originally posted at


Jarrett Meyer
05/31/2012 09:30 AM by
Jarrett Meyer

What about returning a IShardedList, instead of an IList? As long as you implement the IList interface, you can add more information about the performance of the query, have a place for messages/failures, etc. Or does something like this add more complexity than you'd prefer?

Ayende Rahien
05/31/2012 09:31 AM by
Ayende Rahien

Jarrett, That brings you back to the optional failure, and you might not notice that you had errors. And it also doesn't deal with things like Load vs. Query.

05/31/2012 10:11 AM by

Why not add an event listener feature (either DocumentStore wide or when creating a session) for the failure of a node? It can be something that the user MUST set when he has a cluster.

Then the user can choose how to handle a failure globally (can choose whether to kill the system or let it run and show a warning. He can log the error/send notification or anything he'd like).

05/31/2012 11:44 AM by

You can try adding a handler on the session:

Something like:

session.OnShardingStatusChange((args) => ...)

Or even can be an event so them can get the notification for more places

session.ShardingStatusChange += (sender, args) => { ... }

In the args you can provide the query that detectes the fail, etc

Just my 2 cents Cheers

05/31/2012 01:08 PM by

Yeah I would go with a system wide Event. It would be useful if node status was stored on a different system which contained the status of each node and the type of data held on each node. You could then query the node status node on system failure, both are unlikely to be down at same time.

05/31/2012 01:58 PM by

Anything that exposes sharding to a query looks like a leaked abstraction to me. Ideally, queries should not care about whether the backing store is sharded or not. That seems to be one of RavenDB's strongest features.

Disclaimer: All of my knowledge comes from following these posts, I haven't actually played around with it myself yet, so take anything I say w/ a grain of salt.

Christopher Wright
05/31/2012 03:13 PM by
Christopher Wright

An event is nice because I can decide to throw an exception. A property is nicer because I can check it once at the end of the unit of work (let's say, a callback in the base controller).

That said, ShardingStatusChange is wrong.

I don't care at all if the status changed. I only care if I executed a query in the current Session that might have been impacted by a shard being down.

ShardingStatusChange should properly be on DocumentStore. Inside a session, it's far more likely to be impacted by an existing outage than to see a new outage. And there's a question of who sees the event, if there's an outage with several concurrent sessions in different threads.

If instead you have a QueryExecutedWithMissingShards event, you can just plug that into your base controller, when it opens a session. It always executes on the current session if you execute a query with missing shards that might be relevant.

It might be useful to have such a thing on DocumentStore as well, for things that are opening new sessions manually. You get more context with an event on Session -- you hook it into the current unit of work -- but if you have multiple sessions per unit of work, then you want something you only need to set once.

And if the event just throws an exception, you should get most of the context you need from the stack trace.

Brian Vallelunga
05/31/2012 03:16 PM by
Brian Vallelunga

If a node is offline, that's a systems concern, not a query concern. Raven should return what it can and notify the DB admin in some out of band way that there's been a failure of a node.

05/31/2012 03:19 PM by

I think, when you say 'sharding', this excludes a replication-like-system, which mirrors writes to one shard to any other shard asynchronously.

The really annoying problem are the write-misses, as your payment scenario indicates. So, why not introducing a transparent write-proxy as an optional layer. The write-proxy manages the health of the shards in the background. If the shards are OK, then fine. If a shard fails, it caches the writes locally until the shard comes back online.

Quite easy to implement, if the concerns of surveying health and caching are separated cleanly.

Matt Johnson
05/31/2012 04:55 PM by
Matt Johnson

This may be way out there, but what about implementing a parity stripe? Similar to the .par2 files that have been in use for distributing files on usenet feeds? Also similar to how RAID5 works.

Basically, each shard would have its own information, and some parity bits about what's on the other shards. If a shard goes down, even permanently, the parity can be used to reconstruct the missing data.

I'm not sure if there is an "easy" way to implement this, but it would certainly solve the problem.

Ayende Rahien
05/31/2012 07:57 PM by
Ayende Rahien

McZ, Sure, easy to implement. Extremely hard to implement right. How do you handle queries? Sorting? What happens if the server restart while you have data in the write cache? What happens if you are in farm, and some requests go to a different server? Etc, etc.

06/01/2012 08:20 AM by

I think the best scenario is getting the status like you mentioned but not displaying them to the user because the user does not care, but instead sending a message(email or some other form) to the administrators or customer service or both that one of the shards is down.

Maybe getting the status in each location we make query is not a good option so an event listener which caches "on query" event would be a great option.

06/01/2012 11:42 AM by

The central question to me is, how we can handle missing writes, which cannot be dispatched to the adequate shard. Missing queries are annoying, but they will not end catastrophic. A missing payment qualifies for the latter.

I've written a write-proxy four or five times, in two different shapes. The first one serializes JSON-data to the local filesystem, the second one dispatches writes to some shard featuring the final shard-address. The second one was even simpler to implement, as it only involved a tweak in the sharding-config (basically both a fallback and resync lambda).

In both cases, a server restart is not a problem. Only if the server would not be restarted anymore would pose a problem, but only in the first implementation.

Handling single requests on a different server is basically? This server will most likely have the same 'missing shard' problem. The second implementation would even account for this, as the missing writes would be transparent to the system as a whole.

Ayende Rahien
06/01/2012 10:57 PM by
Ayende Rahien

McZ, It is actually a big problem. Let us consider the simple case of:

  • Create payment
  • Show list of payments

Write proxy in this case would actually create the payment, but hide it from the user.

Playing with the sharding function for RavenDB to handle that, however, would be a trivial matter, so you would re-direct writes of a down shard to a new one (or to a replica).

Comments have been closed on this topic.