Use cases for MADV_DONTNEED in Voron

Nov 14 2018

Use cases for MADV_DONTNEED in Voron

time to read 2 min | 341 words

The rant in this video is an absolute beautiful one. I run into this rant figuring out how MADV_DONTNEED work and I thought I would give some context on why the behavior is exactly what I want. In fact, you can read my reasoning directly in the Linux Kernel source code.

During a transaction, we need to put the memory pages modified by the transaction somewhere. We put them in temporary storage which we call scratch. Once a transaction is committed, we still use the scratch memory for a short period of time (due to MVCC) and then we flush these pages to the data file. At this point, we are never going to use the data on the scratch pages again. Just leaving them around means that the kernel needs to write them to the file they were mapped from. Under load, that can actually be a large portion of the I/O the system is doing.

We handle that by tell the OS that we don’t actually need this memory and it should throw it away and not write it to disk using MADV_DONTNEED. We are still checking whatever this cause us excessive reads when we do that (when the kernel tries to re-read the data from disk).

There are things that seems better, though. There is MADV_REMOVE, which will do the same and also zero (efficiently) the data on disk if needed, so it is not likely to cause page faults when reading it back again. The problem is that this is limited to certain file systems. In particular, SMB file systems are really common for containers, so that is something to take into account.

MADV_FREE, on the other hand, does exactly what we want, but will only work on anonymous maps. Our scratch files use actual files, not anonymous maps. This is because we want to give the memory a backing store in the case of memory overload (and to avoid the wrath of the OOM killer). So we explicitly define them as file (although temporary ones). h

Tweet Share Share 4 comments

Tags:

development

Comments

15 Nov 2018
13:38 PM

tobi

I enjoyed that rant as well. The Linux kernel indeed has many design atrocities. My favorite is the name for the function that creates a file: "creat". I wonder what kind of mental process lead to this name.

16 Nov 2018
07:57 AM

Oren Eini

Tobi, The creat isn't actually Linux's fault. It is a very old Unix syscall that was brought over.

16 Nov 2018
10:45 AM

tobi

OK, I did not know that. But someone is to blame! ;-)

16 Nov 2018
11:41 AM

Aleksander Oven

@tobi, see here. Search for "I'd spell creat with an e". :)

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB