Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,971 | Comments: 44,508

filter by tags archive

Sites outage


We have an outage that appears to have taken roughly 12 hours.

The reason it took so long to fix, it was after business hours, and while we have production support for our clients, we never hooked up our own websites to our own system. A typical case of the barefoot shoemaker.

The reason for the outage? Also pretty typical:

image

The reason for that? We had a remote backup process that put some temp files and didn’t clean them up properly. The growth rate was about 3-6 MB a day, so no one really noticed.

The fix:

image

All is working now, I sorry for the delay in fixing this. We’ll be having some discussion here to see how we can avoid repeat issues like that.


Comments

Christian Seitzer

I would suggest a tool like nagios or one of its derivatives to monitor your hard disks.

My experience with icinga and nsclient++ for Windows has been very good..

Jiří Nouza

We use very simple powershell script to check server disk drive free space.

Jim Geurts

+1 for Nagios or Zabbix... you get lots of other built in metrics like cpu load, etc as well

Judah Gabriel Himango

Heheh, seen this plenty of times. Usually it's the IIS logs that hurt me.

Wyatt Barnett

Protip: if you are doing anything on your OS volume you are probably doing it wrong on a server. 1st setup step here is to move everything IIS related to D.

Ajai

Ayende I am just happy you have 149GB of HibernatingRhinos.Orders :)

Robert

You should also probably set customErrors to On and set defaultRedirect to a nice error page that doesn't leak your stack trace...

Ayende Rahien

Robert, I don't do that on purpose.

Dave

When setting up servers I like to allocate a large file of several gigabytes that can be delete when this situation occurs. This has saved my bacon a few times when running out of space on source control repositories.

Daniel Marbach

Ayende I would like to introduce you to Oren Eini. He is the smart man behind RavenDB. In situations like that I always like to quote his excellence from his workshops: "disk space is cheap" :)

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Paying the rent online - about one day from now

There are posts all the way to Aug 03, 2015

RECENT SERIES

  1. Production postmortem (5):
    29 Jul 2015 - The evil licensing code
  2. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
  3. API Design (7):
    20 Jul 2015 - We’ll let the users sort it out
  4. What is new in RavenDB 3.5 (3):
    15 Jul 2015 - Exploring data in the dark
  5. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats