Dependencies management in a crisis

Feb 22 2016

Dependencies management in a crisis

time to read 3 min | 480 words

Typically when people talk about dependencies they talk about how easy it is to version them, deploy them, change & replace them, etc. There seems to be a very deep focus on the costs of dependencies during development.

Today I want to talk about another aspect of that. The cost of dependencies when you have a crisis. In particular, consider the case of having a 2 AM support call that is rooted to one of your dependencies. What do you do then?

The customer see a problem in your software, so they call you, and you are asked to resolve it. After you narrowed the problem down to a particular dependency, you now need to check whatever this is your usage of the dependency that is broken, or whatever there is a genuine issue with the dependency.

Let us take a case in point with a recent support call we had. When running RavenDB on a Windows Cluster with both nodes sharing the same virtual IP, authentication doesn’t work. It took us a while to narrow it down to Windows authentication doesn’t work, and that is where we got stuck. Windows authentication is a wonderfully convenient tool, but if there is an error, just finding out about it require specialized knowledge and skills. After verifying that our usage of the code looked correct, we ended up writing a minimal reproduction with about 20 lines of code, which also reproduced the issue.

At that point, we were able to escalate to Microsoft with the help of the customer. Apparently this is a Kerberos issue and you need to use NTLM and there was a workaround with some network configuration (check our docs if you really care about the details). But the key point here is that we would really have absolutely no way to figure it out on our own. Our usage of Windows authentication was according to the published best practices, but in this scenario you had to do something different to get it to work.

The point here is that if we weren’t able to escalate that to Microsoft, we would be in a pretty serious issue with the customer “we can’t fix this issue” is something that no one wants to hear.

As much as possible, we try to make sure that any dependencies that we take are either:

Stuff that we wrote and understand.
Open source* components that are well understood.
Have a support contract that we can fall back on, with the SLA we require.
Non essential / able to be disabled without major loss of functionality.

* Just taking OSS component from some GitHub repo is a bad idea. You need to be able to trust them, which means that you need to be sure that you can go into the code and either fix things or understand why they are broken.

Tweet Share Share 6 comments

Tags:

Comments

22 Feb 2016
11:03 AM

Daniel Marbach

I once wrote a post with guidelined about how to select OSS libraries. Might be relevant to this article here as well.

http://www.planetgeek.ch/2010/06/20/how-to-select-open-source-libraries/

22 Feb 2016
12:31 PM

wqweto

At 2AM I might be getting ready to go to bed with my tablet still connected to office. . .

22 Feb 2016
14:33 PM

Chris B

There is a talk from Greg Young (I think its his "8 Lines of Code" talk) in which he says some very similar things. If you haven't seen it, there is some great stuff in there.

22 Feb 2016
17:45 PM

Steve S

Ahh, you just described our daily nightmare that is Microsoft Azure.

25 Feb 2016
01:47 AM

Fabio Maulo

the problem you had, was really with a piece of oss ?

25 Feb 2016
05:54 AM

Oren Eini

Fabio, No, it was with Windows Auth

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB