Debugging security errors
“I’m getting a 403 Forbidden error” is one of the more annoying things to debug. Something, somewhere, in a distributed system, has decided that a request is not authorized and blocked it.
In RavenDB 3.5, we supported OAuth and Windows Authentication to authenticate clients talking to RavenDB. That meant that we had to field support questions exactly like the one above. That was not fun. First of all, there seem to be an inclination in the security community to hide errors really well. “I don’t like you because the difference between our clocks, let’s send an EPERM to this socket and close it, and if I’m feeling nice, log it to /var/logs/obscure/dev/null”.
I have scars from trying to work out “Why doesn’t Windows Auth work for this user” that involved talking to a DBA who wasn’t neither the domain admin nor even aware of the domain topology in the organization (Fallacies: There is one administrator). At one point, we had a production issue that had RavenDB refusing all access because Windows couldn’t validate credentials because of a disconnect in a linked domain that was down for scheduled maintenance and the credential cache expired.
When we designed RavenDB 4.0, we decided that for security, we need something that would be debuggable. That means:
- Secured – that goes without saying, if it is security measure that is debuggable but isn’t actually secured…
- Rely only on two parties – the client & server in each connection.
- Provide enough information to solve the problem
We selected TLS / SSL for this, with client certificate as the authentication mechanism. This answers the first requirement, because TLS has been analyzed enough that I’m certain that it is secured. It is also well known and familiar to administrators.
TLS uses PKI, so it isn’t technically a two parties solution, you may have to deal with certificates revocation, trust chains, etc. But those are well understood and you have really good errors on those. On the client certificate authentication side, however, we require that we’ll have the actual list of trusted certificates, so there is no need to check anywhere else.
We have also taken debuggable security a couple of steps further. For example, we’ll accept an unsecured connection and let it establish itself. Then send a single message down the line, explaining why we aren’t going to use it.
In practice, this works great. Take a look at this stack overflow question, which shows the following error:
Given just the information in the post and the error from RavenDB, it was easy to figure out that the client certificate hasn’t been registered and that this is why RavenDB is refusing access.
I consider this a major success.
Comments
I think it is still experimental, but with Network Error Logging the browser could report back the errors https://scotthelme.co.uk/network-error-logging-deep-dive/
André , That's great, but that is still problematic because you don't have a way to report errors properly. For example, "I successfully got the client cert you sent me but it expired" is something very different from "I got the client cert and it is not familiar to me" vs. "The certificate is known, but you don't have access to this resource".
The problem is that there is no way to return those errors to the caller in a protocol appropriate manner. That is why we accept invalid connections, write the proper error message and then close them.
Comment preview