Lying as a valid coding pattern
Davy Brion has an interesting post about Protecting Your Application From Remote Problems, which contains a really interesting implementation of CircuitBreaker.
One thing, though, bothered me about it. Let us look at the code:
private class OpenState : CircuitBreakerState { private readonly Timer timer; public OpenState(CircuitBreaker circuitBreaker) : base(circuitBreaker) { timer = new Timer(circuitBreaker.timeout.TotalMilliseconds); timer.Elapsed += TimeoutHasBeenReached; timer.AutoReset = false; timer.Start(); } private void TimeoutHasBeenReached(object sender, ElapsedEventArgs e) { circuitBreaker.MoveToHalfOpenState(); } public override void ProtectedCodeIsAboutToBeCalled() { throw new OpenCircuitException(); } }
The code works correctly, but I… don’t like the implementation. Please note that this is based on more gut feeling than anything else.
The part that I don’t like is that it is doing too much work. The use of timer for the timeout seems like a waste to me. What I would rather do is something like this:
private class OpenState : CircuitBreakerState { private readonly DateTime expiry; public OpenState(CircuitBreaker circuitBreaker) : base(circuitBreaker) { expiry = DateTime.UtcNow + circuitBreaker.timeout; } public override void ProtectedCodeIsAboutToBeCalled() { if(DateTime.UtcNow < expiry) throw new OpenCircuitException(); circuitBreaker.MoveToHalfOpenState(); } }
As seen from outside, this behave in exactly the same fashion, while not actually requiring anything from the system when we are actually not touching the code that the circuit breaker protects. It also force this to be a lazy evaluation, so if we are actually doing some reporting on the state of the circuit breaker, we won’t see the status change until something actually make a call to it.
That may or may not be a consideration. But even if it is, I would rather do the state change in the reporting function than in a dedicated timer.
Comments
I guess we should also separate the decision making mechanism from the actual breaker. I mean you caught the exception - how are supposed to know that is caused by the remote service overload?
It's been over two years since Michael Nygard's excellent Release It! book was published. It's good to see some detailed debate and implementations of this pattern in C#.
ayende.com/.../...g-as-a-valid-coding-pattern.aspx
davybrion.com/.../protecting-your-application-f...
timross.wordpress.com/.../implementing-the-circ...
To me this pattern really isn't a 'circuit breaker' in the true sense of the word it's more of a 'retry policy' - I guess 'circuit breaker' sounds better though ;)
But like a circuit breaker it doesn't tell why the problem occurred just that it has...
@Ollie Riches
it's not a retry-policy at all... the code never retries an operation. If a protected resource is known to have failed X amount of times, it simply fails immediately (which opens the circuit) instead of actually calling the protected resource.
Timer or a similar approach is valid, when you do not want to force unwanted side effects of testing if external system is up on your currently executing thread, i.e. UI. If the system is down and timeouts you don't want to wait for like 30 seconds with your UI freezed or web server thread doing nothing but waiting. Going timer + background thread solves this problem by "pinging" external system in the background.
I don't know what a circuit breaker is, but at a glance it would appear the two versions of the code would behave quite differently.
In the first, the ProtectedCodeIsAboutToBeCalled method will through an exception regardless of any "timeout".
In the second, the exception only occurs if the time has expired.
Am I missing something?
My mistake, the second only if not expired
Also in the first example the MoveToHalfOpenState is called automatically a certain time after the ctor executes. In the 2nd example MoveToHalfOpenState is only called if the caller explicitly calls ProtectedCodeIsAboutToBeCalled after the timeout.
Your code does not work the same. In your implementation, after the protected service was working again and the timeout was up, it would take two calls to the service to get the circuit breaker to actually call the service again.
In the original implementation, the circuit breaker would correctly start calling the service again on the first call once the timeout was up.
I can see how the apparent difference in behavior would not occur because of how the class is used in the context of a circuit breaker.
My guess is that the MoveToHalfOpenState doesn't need to happen unless someone is attempting a protected action.
Also, after moving to the HalfOpenState, the ProtectedCode... of the OpenState class would no longer be called.
Just my guess after thinking about it more.
There's also an insidious bug in your code, which I thought I'd point out because many programmers make the same mistake: by using DateTime.Now, the code can malfunction in the fall when the clock goes back and the circuit breaker will get stuck in the closed state for an hour longer than it should. It's perhaps not a big issue in this case, but one which can cause inexplicable behaviour.
Solution: always use DateTime.UtcNow for this.
Mike,
Nice catch
Comment preview