An outage every 30 minutes

time to read 2 min | 254 words

We run a lot of benchmarks internally and sometimes it feels like there is a roaming band of performance focused optimizers that go through the office and try to find under utilized machines. Some people mine bitcoin for fun, in our office, we benchmark RavenDB and try to see if we can either break a record or break RavenDB.

Recently a new machine was… repurposed to serve as a benchmarking server. You can call it a right of passage for most new machines here, I would say. The problem with that machine is that the client would error. Not only would it fail, but at the exact same interval. We tested that from multiple clients and from multiple machines and found that every 30 minutes on the dot, we’ll have an outage that lasted under one second.

Today I come to the office to news that the problem was found:

image

It seems that after 30 minutes of idle time (no user logged in), the machine would turn off the ethernet, regardless of if there are active connections going on. Shortly afterward it would be woken up, of course, but it would be down just enough time for us to notice it.

In fact, I’m really happy that we got an error. I would hate to try to figure out latency spikes because of something like this, and I still don’t know how the team found the root cause.