API Design: Sharding Status for failure scenarios–explicit failure management doesn’t work

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (647) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1093) rss
raven (1459) rss
ravendb.net (545) rss
reviews (184) rss

2025
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB Workshops - Deep dive into practical use of Document Data Modeling

May 31 2012

API DesignSharding Status for failure scenarios–explicit failure management doesn’t work

time to read 4 min | 634 words

Still going on with the discussion on how to handle failures in a sharded cluster, we are back to the question of how to handle the scenario of one node in a cluster going down. The question is, what should be the system behavior in such a scenario.

In my previous post, I discussed one alternative option:

ShardingStatus status;
va recentPosts = session.Query<Post>()
          .ShardingStatus( out status )
          .OrderByDescending(x=>x.PublishedAt)
          .Take(20)
          .ToList();

I said that I really don’t like this option. But deferred the discussion on exactly why.

Basically, the entire problem boils down to a very simple fact, manual memory management doesn’t work.

Huh? What is the relation between handling failures in a cluster to manual memory management? Oren, did you get your wires crossed again and continued a different blog post altogether?

Well, no. It is the same basic principle. Requiring users to add a specific handler for this result in several scenarios, none of them ideal.

First, what happen if we don’t specify this? We are back to the “ignore & swallow the error” or “throw and kill the entire system”.

Let us assume that we go with the first option, the developer has a way to get the error if they want it, but if they don’t care, we will just ignore this error. The problem with this approach is that it is entirely certain that developers will not add this, at least, not before the first time we have a node fail in production and the system will simply ignore this and show the wrong results.

The other option, throw an exception if the user didn’t ask for the sharding status and we have a failing node, is arguably worse. We now have a ticking time bomb. If a node goes down, the entire system will go down. The reason that I say that this is worse than the previous option is that the natural inclination of most developers is to simply stick the ShardingStatus() there and “fix” the problem. Of course, this is basically the same as the first option, but this time, the API actually let the user down the wrong path.

Second, this is forcing a local solution on a global problem. We are trying to force the user to handle errors at a place where the only thing that they care about is the actual business problem.

Third, this alternative doesn’t handle scenarios where we are doing other things, like loading by id. How would you get the ShardingStatus from this call?

session.Load<Post>("tri/posts/1");

Anything that you come up with is likely to introduce additional complexity and make things much harder to work with.

As I said, I intensely dislike this option. A much better alternative exists, and I’ll discuss this in the next post…

Tweet Share Share 14 comments

Tags:

raven

Comments

31 May 2012
09:30 AM

Jarrett Meyer

What about returning a IShardedList<T>, instead of an IList<T>? As long as you implement the IList interface, you can add more information about the performance of the query, have a place for messages/failures, etc. Or does something like this add more complexity than you'd prefer?

31 May 2012
09:31 AM

Ayende Rahien

Jarrett, That brings you back to the optional failure, and you might not notice that you had errors. And it also doesn't deal with things like Load vs. Query.

31 May 2012
10:11 AM

Nadav

Why not add an event listener feature (either DocumentStore wide or when creating a session) for the failure of a node? It can be something that the user MUST set when he has a cluster.

Then the user can choose how to handle a failure globally (can choose whether to kill the system or let it run and show a warning. He can log the error/send notification or anything he'd like).

31 May 2012
11:44 AM

Marcos

You can try adding a handler on the session:

Something like:

session.OnShardingStatusChange((args) => ...)

Or even can be an event so them can get the notification for more places

session.ShardingStatusChange += (sender, args) => { ... }

In the args you can provide the query that detectes the fail, etc

Just my 2 cents Cheers

31 May 2012
13:08 PM

Kevin

Yeah I would go with a system wide Event. It would be useful if node status was stored on a different system which contained the status of each node and the type of data held on each node. You could then query the node status node on system failure, both are unlikely to be down at same time.

31 May 2012
13:58 PM

Chris

Anything that exposes sharding to a query looks like a leaked abstraction to me. Ideally, queries should not care about whether the backing store is sharded or not. That seems to be one of RavenDB's strongest features.

Disclaimer: All of my knowledge comes from following these posts, I haven't actually played around with it myself yet, so take anything I say w/ a grain of salt.

31 May 2012
15:13 PM

Christopher Wright

An event is nice because I can decide to throw an exception. A property is nicer because I can check it once at the end of the unit of work (let's say, a callback in the base controller).

That said, ShardingStatusChange is wrong.

I don't care at all if the status changed. I only care if I executed a query in the current Session that might have been impacted by a shard being down.

ShardingStatusChange should properly be on DocumentStore. Inside a session, it's far more likely to be impacted by an existing outage than to see a new outage. And there's a question of who sees the event, if there's an outage with several concurrent sessions in different threads.

If instead you have a QueryExecutedWithMissingShards event, you can just plug that into your base controller, when it opens a session. It always executes on the current session if you execute a query with missing shards that might be relevant.

It might be useful to have such a thing on DocumentStore as well, for things that are opening new sessions manually. You get more context with an event on Session -- you hook it into the current unit of work -- but if you have multiple sessions per unit of work, then you want something you only need to set once.

And if the event just throws an exception, you should get most of the context you need from the stack trace.

31 May 2012
15:16 PM

Brian Vallelunga

If a node is offline, that's a systems concern, not a query concern. Raven should return what it can and notify the DB admin in some out of band way that there's been a failure of a node.

31 May 2012
15:19 PM

McZ

I think, when you say 'sharding', this excludes a replication-like-system, which mirrors writes to one shard to any other shard asynchronously.

The really annoying problem are the write-misses, as your payment scenario indicates. So, why not introducing a transparent write-proxy as an optional layer. The write-proxy manages the health of the shards in the background. If the shards are OK, then fine. If a shard fails, it caches the writes locally until the shard comes back online.

Quite easy to implement, if the concerns of surveying health and caching are separated cleanly.

31 May 2012
16:55 PM

Matt Johnson

This may be way out there, but what about implementing a parity stripe? Similar to the .par2 files that have been in use for distributing files on usenet feeds? Also similar to how RAID5 works.

Basically, each shard would have its own information, and some parity bits about what's on the other shards. If a shard goes down, even permanently, the parity can be used to reconstruct the missing data.

I'm not sure if there is an "easy" way to implement this, but it would certainly solve the problem.

31 May 2012
19:57 PM

Ayende Rahien

McZ, Sure, easy to implement. Extremely hard to implement _right_. How do you handle queries? Sorting? What happens if the server restart while you have data in the write cache? What happens if you are in farm, and some requests go to a different server? Etc, etc.

01 Jun 2012
08:20 AM

Apostol

I think the best scenario is getting the status like you mentioned but not displaying them to the user because the user does not care, but instead sending a message(email or some other form) to the administrators or customer service or both that one of the shards is down.

Maybe getting the status in each location we make query is not a good option so an event listener which caches "on query" event would be a great option.

01 Jun 2012
11:42 AM

McZ

The central question to me is, how we can handle missing _writes_, which cannot be dispatched to the adequate shard. Missing queries are annoying, but they will not end catastrophic. A missing payment qualifies for the latter.

I've written a write-proxy four or five times, in two different shapes. The first one serializes JSON-data to the local filesystem, the second one dispatches writes to some shard featuring the final shard-address. The second one was even simpler to implement, as it only involved a tweak in the sharding-config (basically both a fallback and resync lambda).

In both cases, a server restart is not a problem. Only if the server would not be restarted anymore would pose a problem, but only in the first implementation.

Handling single requests on a different server is basically? This server will most likely have the same 'missing shard' problem. The second implementation would even account for this, as the missing writes would be transparent to the system as a whole.

01 Jun 2012
22:57 PM

Ayende Rahien

McZ, It is actually a big problem. Let us consider the simple case of:

Create payment
Show list of payments

Write proxy in this case would actually create the payment, but hide it from the user.

Playing with the sharding function for RavenDB to handle that, however, would be a trivial matter, so you would re-direct writes of a down shard to a new one (or to a replica).

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

API DesignSharding Status for failure scenarios–explicit failure management doesn’t work

More posts in "API Design" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "API Design" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication