Reviewing FASTER: When the data hits the disk

Sep 05 2018

Reviewing FASTERWhen the data hits the disk

time to read 5 min | 817 words

So far, I ignored anything in FASTER about how the data actually hits the disk. Based on my reading of the code and the paper, here is what I think that is going on. FASTER works in segments and in conjunction with its allocator. When you create a new instance, you have to define what would be the log size. From looking at the code, they seem to be favoring 16GB as the default size of the log. This is passed to PersistentMemoryMalloc, which uses pages of 32MB each to manage the memory. Out of these 16GB, 14.4GB are considered to be mutable and 1.6 GB is considered to be read only.

On startup, this class allocates two pages (64MB). Whenever FASTER needs more memory, such as to store a new record, it will eventually call to this method:

Again we have num_slots that actually means size in bytes, but I’ll leave my pet peeves aside.

You can see that this allocates from tail of the page use Reserve, which does an atomic operation. If we run out of space in the 32MB page, the caller need to call NewPage() to handle the new allocation. This plays together with the buffer management and the epoch. In particular, here is how a new page is allocated in the normal case. Assuming we just started and we consumed 64MB of memory, the new entry will allocate the 3rd page. This will also move the section of read only memory when there is a new page allocated and the number of active pages is beyond 14.4 GB.

What this means, in practice, is that FASTER typically has a 14.4GB range in which all operations are working on purely mutable memory. That means that two impressions on the same ad will end up being very close to simply Interlocked.Increment on the relevant value. This is the key for the performance that FASTER exhibits.

What happens once we start going beyond the 14.4 GB? FASTER will begin to move data to the read only section. In this case, it means that the any new modifications to the data will create a new copy of it in the mutable section.

At the same time, PageAlignedShiftReadOnlyAddress() will also starts an async process of writing these readonly pages to disk. Here is how it works:

If the read_only_address was updated, FASTER calls to BumpCurrentEpoch() and will execute OnPagesMarkedReadOnly() when the Epoch moves beyond this range (this works because then it is guaranteed that no one may mutate this memory). That method looks like this:

The notion of read_only_address and safe_readonly_only_address is discussed in the paper quite nicely, by the way. AsyncFlushPages() writes the data to disk, as you would expect and updates various in memory structures.

Note that just having the data written to disk doesn’t remove the in memory copy. It is still available in memory until it is pushed back from the log by new data. Now that we understand how the data goes to the log and then to the disk, I want to figure out how the data is written in to the disk itself. Actual writes are handled here:

What you can see is that the destination offset is used to divide the data on disk to sections. Each section is 1GB in size. In other words, the way FASTER works is to write the data in 1 GB segments that are sequential over time.

This also plays into the expiration policy that FASTER employs. Because it uses a logs based system, it accumulate data over time and will need to handle that. The current way it deals with the problem is to just delete old files, this gets rid of the data in roughly time based order, which is suitable for the use case that the paper talks about. Another alternative is to read the old files, and move the still valid entries to the start. That doesn’t seem to be implemented and I think it will be pretty hard to do and likely consume a lot of resources.

I’ll probably have another summary post about FASTER, but that is pretty much it. I’ve ignored other parts (recovery, several state machines used to handle internal state, etc), but they aren’t important to grokking what it is actually doing. It is an interesting codebase, but it feels… incomplete. But I’ll reserve such thoughts to the summary post.

Tweet Share Share 0 comments

Tags:

reviews

Oren Eini

Oren Eini

CEO of RavenDB