Okay, after spending quite a lot of time digging through the leveldb codebase, and with several years of working with RavenDB, I can say with confidence that the CLR make it extremely hard to build high performance server side systems using the CLR.
Mostly, the issues are related to GC and memory. In particular, not having any way to control memory allocation and/or the GC means that we can’t optimize those scenarios in any meaningful way. At the same time, I do not want to go back to the unmanaged world. As mentioned ,I just came back from a very deep dive into a non trivial C++ codebase ,and while I consider that codebase a really good one, that ain’t to say it is a pleasure to always be thinking about all the stuff that the CLR just takes away.
Therefor, I decided that I’m going to be doing something about it. And Rattlesnake.CLR was born:
The major features of the Rattlesnake.CLR include explicit memory management when required. Let us say that we know that we are going to be needing some amount of memory for a while, and then all of that can be thrown away. This is extremely common in scenarios such as a web request, pretty much all the memory that you generate during the processing web request can be safely free immediately. In RavenDB’s case, the memory we consume during indexing can be free immediately when we stop indexing. Right now this is a painful process of making sure that we allocate within the same gen0 and hoping that it won’t be too expensive, or that we won’t get a complete halt of the entire server while it is releasing memory. It also make it really hard to do things like limit the amount of memory your code uses.
Another requirement that I have is that Rattlesnake.CLR should be able to execute existing .NET assemblies without any additional steps. Since I don’t fancy doing ports of stuff that already exists.
In order to handle this scenario with the given constraints, we have:
All the code within the using statement is allocated in our own heap. In line 13, we are destroying all of that memory in one fell swoop.
There are a few notes about this that we probably should address:
- By default, memory allocated by this form is not subject to any form of GC. The idea is that this whole heap is getting released immediately.
- Note that last two parameters for the Heap.Create. The first is the initial size of the heap, and the second is the max size. We now have a real way to actually limit the amount of memory a piece of code will use. This is really important on server applications where avoiding paging is critical.
- For that matter, we can now figure out how much memory a particular piece of code uses, and allocate our resources accordingly.
- You can use multiple heaps at the same time, although only one can be installed as the default allocation at a given point in time.
There is the explicit heap.GarbageCollect() method that will do GC only on that heap, and which you can schedule at your own convenience. You can have two heaps, and allocate from one while you are GCing from the other. And yes ,that means that GCs using this methods will not stop the process!
Memory allocated on the heap is obviously only valid as long as the heap is valid. That means that once the heap is destroyed, you can’t access any of the objects that were created there. This has implications for things like cache. We provide MemoryAllocations.AllocateOnGlobalHeap<T>(args) method to force you to use the global heap, instead, if you want this memory to be always available and subject to GC.
This is early days yet, but we already see some really interesting performance improvements!
How does this work?
While an early experiment with Rattlensake.CLR was based on the Mono runtime. I quickly decided that I wanted to keep using the MS CLR. Now, it order to handle this I had to do some unnatural things (to say the least), but I think that I even managed to make this a supported option. Essentially, we are using the CLR Hosting API for this. In particular:
You can use Rattlesnake.CLR like this:
Just for fun, we also allowed to place limits on the default heap, so you can be sure that you aren’t allocating too much there.
.\Rattlesnake.exe Raven.Server.exe --max-default-heap-size=256MB
We are still running some tests, but this is looking really good.