Sites outage

Oct 29 2013

Sites outage

time to read 1 min | 182 words

We have an outage that appears to have taken roughly 12 hours.

The reason it took so long to fix, it was after business hours, and while we have production support for our clients, we never hooked up our own websites to our own system. A typical case of the barefoot shoemaker.

The reason for the outage? Also pretty typical:

The reason for that? We had a remote backup process that put some temp files and didn’t clean them up properly. The growth rate was about 3-6 MB a day, so no one really noticed.

The fix:

All is working now, I sorry for the delay in fixing this. We’ll be having some discussion here to see how we can avoid repeat issues like that.

Tweet Share Share 10 comments

Tags:

bugs

Comments

29 Oct 2013
06:12 AM

Christian Seitzer

I would suggest a tool like nagios or one of its derivatives to monitor your hard disks.

My experience with icinga and nsclient++ for Windows has been very good..

29 Oct 2013
11:48 AM

Jiří Nouza

We use very simple powershell script to check server disk drive free space.

29 Oct 2013
13:50 PM

Jim Geurts

+1 for Nagios or Zabbix... you get lots of other built in metrics like cpu load, etc as well

29 Oct 2013
14:56 PM

Judah Gabriel Himango

Heheh, seen this plenty of times. Usually it's the IIS logs that hurt me.

29 Oct 2013
15:45 PM

Wyatt Barnett

Protip: if you are doing anything on your OS volume you are probably doing it wrong on a server. 1st setup step here is to move everything IIS related to D.

29 Oct 2013
16:23 PM

Ajai

Ayende I am just happy you have 149GB of HibernatingRhinos.Orders :)

29 Oct 2013
17:31 PM

Robert

You should also probably set customErrors to On and set defaultRedirect to a nice error page that doesn't leak your stack trace...

30 Oct 2013
02:54 AM

Ayende Rahien

Robert, I don't do that on purpose.

30 Oct 2013
21:11 PM

Dave

When setting up servers I like to allocate a large file of several gigabytes that can be delete when this situation occurs. This has saved my bacon a few times when running out of space on source control repositories.

04 Nov 2013
17:29 PM

Daniel Marbach

Ayende I would like to introduce you to Oren Eini. He is the smart man behind RavenDB. In situations like that I always like to quote his excellence from his workshops: "disk space is cheap" :)

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB

Sites outage

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication