mimalloc is a memory allocator that is small and efficient, at least so the docs say. Which was interesting enough for me to take a look. We have had to do a lot of work in memory allocation inside RavenDB, and looking into how other people are doing that is always interesting. What was really attractive for me here was the fact that this is a small codebase, so I can go over that fairly quickly, and the amount of complexity involved is going to be limited.
As usual, I’m just going over the code, recording my impressions.
I started by looking at the API in mimalloc.h, and while it seems threatening, pretty much all of details here are around making it clear to the compiler what this code is doing, to enable additional optimizations.
If you ignore all of these, this is just a pretty normal function declaration with the usual malloc signature.
The code is very well commented, but it looks like it is going to take a serious inspection to actually figure out what is going on. I started by going over the header files, and they show some tantalizing details, but I’m missing context that I assume that I’ll get when I’ll go over the actual code.
The code talks about segments and blocks. I think that a segment is a chunk of memory that we allocated from the OS, and a block is a chunk of memory that we allocate to users of mimalloc. I like that there is an explicit model here of multi threading. There is a heap per thread, it seems, but you can free memory from another thread as well. This match very closely what we have done with RavenDB’s internal memory management.
I started to read the OS specific parts of the code, and I hit gold (as in, stuff that is really interesting to read) almost immediately:
There is a long discussion that details a lot of really fascinating details here, including certain level of “are you really going to go there?! OMG, you went there!”. For example, in order to patch malloc(), mimalloc need to patch atexit() to ensure that the process can shutdown normally.
I started reading the init.c file, and it isn’t about memory management at all, it is all about integrating mimalloc into the system, and that is quite fascinating on its own.
Here is how the memory is patched on ARM (similar code exists for X86 and X64):
What you see here is building of raw assembly instructions to do a task. I do wonder how this works, given that I would expect usual executable memory to be non writable, but I’ll look at that in a bit.
Then we have the actual list of functions to patch:
The last lines are scary, I have to admit.
And I found how the code deals with the memory protection:
I’m currently doing what is effectively random reads throughout the codebase, mostly because this is close to midnight and I’m not going to really be able to grok anything. I run into this function:
Nothing “special”, but it did lead me to some interesting articles about hashing, and what they are good for, how to build them, etc.
One of the reasons that I love doing these code reviews is that I learn so much more than what I expected to.
For example, the way mimalloc initialize its random is really quite interesting (see: _mi_random_init()) and elegant.