RavenDB Operations: Production Stack Traces on Windows & Linux
A process running on your system is typically a black box. You don’t have a lot of insight into what is going on inside it. Oh, there are all sort of tools you can use to infer things out (looking at system calls, memory consumption, network connections, etc), but by default… it is a mystery.
RavenDB is a database. It is meant to run unattended for long durations and is designed to mostly run itself. That means that when you look at it, you want to be able to figure out exactly what is going on with the system as soon as possible. To that end, we have included a lot of features inside RavenDB that expose the internal state of the system. From tracing each I/O and its duration to providing detailed statistics about costs and amount of effort invested in various tasks.
These features are invaluable to figure out exactly what is going on in RavenDB at a particular point in time. Of course, nothing beats the ability to open a debugger and inspect the state of the system. But that is something that you can only really do on development. It is not something that can be done on production, obviously. Or can it?
Since RavenDB 3.0, we actually had just this feature, being able to ask RavenDB to capture and display its own state in a format that should be very familiar to developers. When we created RavenDB 4.0, we were able to carry on this feature on Windows (at some cost), but it was a complete non starter on Linux.
On Windows, a process can debug another process if they belong to the same user (somewhat of an over simplification, but good enough). On Linux, the situation is a lot more complex. A process can usually only debug another process if the debugger is running as root or is the parent process of the debugee process.
Another complication was that we are using ClrMD, a wonderful library that allow us to introspect live processes (among many other things). It did not have support for Linux, until about a month ago… as soon as we had the most basic of support there, we jumped into action, seeing how we can bring this feature to Linux as well. A lot of our users are running production systems on Linux, and the ability to look at the system and go: “Hmm. I wonder what this is doing” and then being able to tell is something that we consider a major boost to RavenDB.
It took a lot of fighting and learning a lot more about how debugging permissions work on Linux than I ever wanted to know. But we got it working (details below). You can see how this looks like on a live Linux server:
As you can see, there is an indexing thread here doing some work on spatial data. We are going to enhance this view further with the ability to see CPU times as well as job names. The idea is that this is something that you will look at and get enough insight to not need to check the logs or try to infer what is going on. You could just tell.
Now, for the gory details of how this works. We changed the implementation on both Windows and Linux to use passive attach to process, which is much faster. The first thing we tried, once we moved to passive attach is to debug ourselves.
This is a nice enough feature, and quite elegant. We debug ourselves, pull the stack traces and display the data. Unfortunately, this doesn’t work on Linux. A process cannot debug itself. All debugging in Linux is based on the ptrace() system call, and the permissions to that are as specified. I can’t imagine the security implications of letting a process debug itself. After all, it is already can do anything the process can do, because it is the process. But I guess that this is an esoteric enough scenario that no one noticed and then the reaction was, use a workaround.
The usual workaround is to have a process that would spawn RavenDB and then it would be able to debug it. That is… possible, but it would be a major shift in how we deploy, not something that I wanted to do. There is also the ptrace_scope flag, which is supposed to control this behavior. In my tests, at least, disabling the security checks via this flag did absolutely nothing.
Running as root worked, just fine, of course. And then the process crashed. On Linux, when trying to debug your own process, there seems to be an interesting interaction between the debugger and debuggee if an exception is thrown. To the point where it will corrupt the CoreCLR state and kill the process. That was a fun bug to trace, sort of. Linux has a escape hatch in the form of PR_SET_PTRACER option that can be used. However, you can’t designate your own process, unfortunately. That combined with the hard crashes, made self debugging a non starter.
But I still want this feature, and without changing too much about how we are doing things.
Here is what we ended up doing. We have a separate process just to capture the stack trace. When you ask RavenDB to get its stack trace, it will spawn this process, but ask it to wait. It will then grant the new process the permissions necessary to debug RavenDB and signal it to continue. At this point, the debugger child process will capture the stack trace and send it back to RavenDB. RavenDB will reset the permission, enhance the stack trace with additional information that we can provide from inside the process and display it to the user.
The actual debugger process is also marked with setcap to provide it additional permissions it needs. This separation means that we isolate these permissions to a single purpose tool that can be invoked and closed, without increasing the attack surface of RavenDB.
The end result is that you can walk to a production RavenDB server, running on Windows or Linux, and get better information than if you just attached to it with the debugger.
Comments
Does this work inside Linux docker container with default options?
Sergey, You'll need to use
SYS_PTRACE
when running in a docker container, I'm afraid.Comment preview