After describing in detail the major refactoring we did for how RavenDB (via Voron, its storage engine) has gone through, there is one question remaining. What’s the point? The code is a lot simpler, of course, but the whole point of this much effort is to allow us to do interesting things.
There is performance, of course, but we haven’t gotten around to testing that yet because something that is a lot more interesting came up: Disk space management.
Voron allocates disk space from the operating system in batches of up to 1GB at a time. This is done to reduce file fragmentation and allow the file system to optimize the disk layout. It used to be something critical, but SSDs and NVMe made that a lot less important (but still a factor).
What happens if we have a very large database, but we delete a big collection of documents? This is a case where the user’s expectations and Voron’s behavior diverge. A user who just deleted a few million documents would expect to see a reduction in the size of the database. But Voron will mark the area on the disk as “to-be-used-later” and won’t free the disk space back to the operating system.
There were two reasons for this behavior:
- We designed Voron in an era where it was far more common for systems to have hard disks, where fragmentation was a very serious problem.
- It is really complicated to actually release disk space back to the operating system.
The first reason is no longer that relevant, since most database servers can reasonably expect to run on SSD or NVMe these days, significantly reducing the cost of fragmentation. The second reason deserves a more in-depth answer.
In order to release disk space back to the operating system, you have to do one of three things:
- Store the data across multiple files and delete a file where it is no longer in use.
- Run compaction, basically re-build the database from scratch in a compact form.
- Use advanced features such as sparse files (hole punching) to return space to the file system without changing the file size.
The first option, using multiple files, is possible but pretty complex. Mostly because of the details of how you split to multiple files, whenever a single entry in an otherwise empty file will prevent its deletion, etc. There are also practical issues, such as the number of open file handles that are allowed, internal costs at the operating system level, etc.
Compaction, on the other hand, requires that you have enough space available during the compaction to run. In other words, if your disk is 85% full, and you delete 30% of the data, you don’t have free space to run a compaction.
Another consideration for compaction is that it can be really expensive. Running compaction on a 100GB database, for example, can easily take hours and in the cloud will very quickly exhaust your I/O credits.
RavenDB & Voron have supported compaction for over a decade, but it was always something that you did on very rare occasions. A user had to manually trigger it, and the downsides are pretty heavy, as you can see.
In most cases, I have to say, returning disk space back to the operating system is not something that is that interesting. That free space is handled by RavenDB and will be reused before we’ll allocate any additional new space from the OS. However, this is one of those features that keep coming up, because we go against users’ expectations.
The final option I discussed is using hole punching or sparse files (the two are pretty much equivalent - different terms between operating systems). The idea is that we can go to the operating system and tell it that a certain range in the file is not used, and that it can make use of that disk space again. Any future read from that range will return zeroes. If you write to this region, the file system will allocate additional space for those writes.
That behavior is problematic for RavenDB, because we used to use memory-mapped I/O to write to the disk. If there isn’t sufficient space to write, memory-mapped I/O is going to generate a segmentation fault / access violation. In general, getting an access violation because of a disk full is not okay by us, so we couldn’t use sparse files. The only option we were able to offer to reduce disk space was full compaction.
You might have noticed that I used past tense in the last paragraph. That is because I am now no longer limited to using just memory-mapped I/O. Using normal I/O for this purpose works even if we run out of disk space, we will get the usual disk full error (which we are already handling anyway).
Yes, that means that starting with RavenDB 7.1, we’ll automatically release free disk space directly back to the operating system, matching your likely expectations about the behavior. This is done in increments of 1MB, since we still want to reduce fragmentation and the number of file metadata that the file system needs to manage.
The one MB trigger
RavenDB will punch a hole in the file whenever there is a consecutive 1MB of free space. This is important to understand because of fragmentation. If you wrote 100 million documents, each 2 KB in size, and then deleted every second document, what do you think will happen? There won’t be any consecutive 1MB range for us to free.
Luckily, that sort of scenario tends to be pretty rare, and it is far more common to have clustering of writes and deletes, which allow us to take advantage of locality and free the disk space back to the OS automatically.
RavenDB will first use all the free space inside the file, reclaiming sparse regions as needed, before it will request additional disk space from the OS. When we do request additional space, we’ll still get it in large chunks (and without using sparse files). That is because it is far more likely to be immediately used, and we want to avoid giving the file system too much work.
Note that the overall file size is going to stay the same, but the actually used disk space is going to be reduced. We updated the RavenDB Studio to report both numbers, but when browsing the files manually, you need to keep that in mind.
I expect that this will be most noticeable for users who are running on cloud instances, where it is common to size the disks to be just sufficiently big enough for actual usage.
It Just Works
There is no action that you need to take to enable this behavior, and on first start of RavenDB 7.1, it will immediately release any free space already in the data files.
The work was actually completed and merged in August 2024, but it is going to be released sometime in Q2/Q3 of 2025. You might have noticed that there have been a lot of low-level changes targeted at RavenDB 7.1. We need to run them through the wringer to make sure that everything works as it should.
I’m looking forward to seeing this in action, there are some really nice indications about the sort of results we can expect. I’ll talk about that in more detail in another post, this one is getting long enough.