API Design: Sharding Status for failure scenarios–ignore and move on

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (647) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1093) rss
raven (1459) rss
ravendb.net (545) rss
reviews (184) rss

2025
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB - High-Performance NoSQL Document Database

May 29 2012

API DesignSharding Status for failure scenarios–ignore and move on

time to read 2 min | 246 words

In my previous post, I discussed how to handle partial failure in sharded cluster scenario. This is particularly hard because this is the case where one node out of a 3 nodes (or more) cluster is failing, and it is entirely likely that we can give the user at least partial service properly.

The most obvious, and probably easiest option, is to simply catch and log the error for the failing server and not notify the calling application code about this. The nice thing about this option is that if you have a failing server, you don’t have your entire system goes down, and can handle this at least partially.

The sad part about this option is that there really is a good chance that you won’t notice that some part of the system is down, and that you are actually returning only partial results. That can lead to some really nasty scenarios, such as the case where we “lose” an order, or a payment, and we don’t show this to the user.

That can lead to some really frustrating scenarios where a customer is saying “but I see the payment right here” and the help desk says “I am sorry, but I don’t see it, therefor you don’t get any service, have a good day, bye”.

Still… that is probably the best case scenario considering the alternative being the entire system being unavailable if any single node is down.

Or is it… ?

Tweet Share Share 10 comments

Tags:

raven

Comments

29 May 2012
09:48 AM

Thomas Krause

How about throwing an exception but including the partial results in the exception details?

Like this you force the calling code to handle this scenario. It can then show a message to the user ("due to technical problems some information may be missing") along with the partial results.

29 May 2012
09:53 AM

Ayende Rahien

Thomas, Wow! That is something that I would have never though of. Interesting approach, and something that I might well utilize in the future. It nicely handle the scenario of getting the data and not avoiding the error, but it doesn't actually work in practice. It means that you have to be very careful about what you are doing, and that a single place where you forgot to do this would result in your site being effectivley down.

29 May 2012
09:56 AM

alexidsa

What if put "AllNodesAreOnline" flat into statistics? In this we could find out whether query was executed correctly similar to how we get total results count for paging (http://old.ravendb.net/faq/total-results-in-paged-data-set)

29 May 2012
10:01 AM

Thomas Krause

Yes, that is the downside. The alternative would be to return a result always but include some error details in the result. But this makes it very easy to silently ignore the error.

It probably depends on the application which scenario is more desirable.

29 May 2012
10:09 AM

Giedrius

Hm, what about classic saving all relevant data in single shard? In such case client and all his orders/payments/etc would go to single shard and either he would get everything, either nothing, but other clients would still be able to access system.

29 May 2012
12:05 PM

Jonty

Why not make it configurable? Safe by default would probably mean throw the exception (with the results), but give the option to change that behaviour.

29 May 2012
12:33 PM

Ayende Rahien

Jonty, Either options doesn't work. It is too big a tool, turn it on/off globally isn't helpful

29 May 2012
13:31 PM

Bundermuft

Just an alternative view would be using additional passive mirroring as, for example Greenplum offers. I.e. you have A B C nodes, B mirrors A, C mirrors B and A mirrors C. So if one of the nodes goes offline, system can continue to work (with integrity).

At the API level this should come as additional metadata of the session and it should be controlled by initializing the session (fails with exception, does not fail with the exception). 99% of the time I can imagine, app should not know about the node down, rather than sysadmin should get the alarm shower.

29 May 2012
18:32 PM

Chris Wright

As a user of ravendb, I want some way I can check the global health of my database without mucking about with exception handling. I want to be able to look in my session at the end of a request and see: was there anything that might have impacted correctness?

If so, I'll throw up an alert message at the top of the page: "We might not be showing you all available data."

That way, my customer support people can see that and give people the benefit of the doubt. They have the most information I can still provide, and also know that they don't have all the information they should.

I don't necessarily need to know this for each query; that might be useful, but it's way too granular for most of what I need to do. I just need to know whether to put up the alert box or not.

In a different context, I might want more granular options. Maybe I want to fail right away, or just check at one or two critical points if I have missing data.

I think the two ways of handling I want are: - Fail immediately always, by throwing an exception. - Give me the best results possible always, and record on Session whether there are any failures. That should be a property available for me to check at any time, though if I'm using some sort of batching, I may have started an operation that is doomed and will not be reflected there.

I'm not sure of anything else that would be useful.

29 May 2012
19:45 PM

Pop Catalin

Either way can be the "only" good way for a certain type of applications/

Therefore how about, add the possibility to set the default globally and allow overriding locally.

But please go with the safe default ;). Those that want unsafe should opt in, not out out.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

API DesignSharding Status for failure scenarios–ignore and move on

More posts in "API Design" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "API Design" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication