Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 18 | Comments: 72

filter by tags archive

RavenHQ & Amazon EC2 Outage

time to read 1 min | 178 words

Update: The issue has been resolved, I’ll update on Sunday with full details about this.

Some RavenHQ customers may have noticed that they are currently unable to access their RavenHQ databases.

The underlying reason is an outage in Amazon US-EAST-1 region, which is where the ravenhq.com server and some of the data base servers are located.

Customers with replicated plans should see no disturbance of service, since this should trigger an automatic failover to the secondary node.

If you have any questions / require support, please contact us via our support forum: http://support.ravenhq.com/

You can see the status report from Amazon below. We managed to restore some service, but then lost it again (because of EBS timeouts, I suspect).

We are currently hard at work at bringing up new servers in additional availability zones, and we hope to restore full functionality as soon as possible.

image


Comments

James Manning

Needs more chaos monkey!

http://www.codinghorror.com/blog/2011/04/working-with-the-chaos-monkey.html

Fail early, fail often! :)

Edward Spelt

Netflix, Pinterest en Instagram are/where offline too.

Christopher Wright

Have you considered hosting RavenHQ servers in multiple availability zones and regions, and randomizing your non-replicated customers between zones?

It sucks to have a large portion of your customer base experience an outage because of you. By hosting in multiple zones, you can reduce the portion of your customer base that any single incident affects.

Also, when you say "the secondary node", I immediately get worried. Is there a tertiary node? Are all the secondary nodes in the same zone as each other?

Ayende Rahien

Christopher, We are running in multiple availability zones, and we are putting non replicated clients in multiple zones. I'll post a full discussion of this tomorrow, but the reason you saw a lot of activity about that is that most of our free plans resided on that region.

"The secondary node" is actually a RavenDB term, which refers to the secondary node that you fail to. The question if you have tertiary or more actually depend on your plan. And N node in a replicated plan are on a different availability zone.

Colin Bull

Hummm, wondering whether this was the pesky leap second introduced on 30 June at midnight..

Christopher Wright

@Ayende: Ah, okay, that's a lot better than I initially thought. I should have remembered that you're smart.

I personally would never trust anything important to just two zones, but I understand that a lot of companies are budget-conscious and can take a short outage, especially if they can point at someone else to blame.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. RavenDB 3.0 New Stable Release - 5 hours from now
  2. Production postmortem: The industry at large - about one day from now
  3. The insidious cost of allocations - 2 days from now
  4. Buffer allocation strategies: A possible solution - 5 days from now
  5. Buffer allocation strategies: Explaining the solution - 6 days from now

And 3 more posts are pending...

There are posts all the way to Sep 11, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    01 Sep 2015 - The case of the lying configuration file
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats