Reviewing FASTERDigging into the C++ impl
After going over the paper and the managed implementation, I’m ready to start with the C++ implementation. I have higher hopes for this code. As I started browsing the C++ code, it occurred to me that the way the C#’s implementation handles dynamic code generation is pretty much how templates in C++ work. I wonder if this was the trigger for that.
The C++ code reads a lot more naturally to me. There are some nice tricks that are used there that are a joy to read. For example, take a look at Address manipulation:
The colon syntax are a way to express bitfields in C. But the real fun part for me is the idea of control_. What is this for? Well, it runs out that in addition to Address, the also defined AtomicAddress, whose implementation need to provide atomic operation on address. This is implemented in the following manner:
I find this a really elegant way to handle this requirement.
Another amusing observation is that almost all the code in FASTER is in .h files, because of the heavy use of templates. I wonder how that affects compilation speed and how that would play in larger projects.
It is in faster.h that we start to get into the interesting bits. I first run into this guy:
This maps pretty closely to what I have actually seen the C# code does, but in C++ it is a much more natural approach that dynamic compilation on the fly as it did in C#.
Next we have the constructor, which looks like this:
The epoch_ field is auto initialized by the compiler and is not shown here. This indicates that FASTER can handle up to 2.1 billion entries in total, which seems to be a strange limit for a data store that is expected to handle hundreds of thousands of operations per second. I’m going to jump around the codebase a bit, because I want to follow exactly what is going on when initializing this class. The first place to look is the epoch. The idea of epoch is described in the paper, so I’m not going to repeat it. The code defines a struct that is 64 bytes in size (cache line sized, to avoid false sharing), this is used to store a thread specific value and is used to maintain most of the invariants of the epoch.
When switching between epochs, there are actions that needs to be run, here is what this looks like in the code.
I must say, this really mess up with my mind, because we have C#’s naming conventions (TryPop, TryPush) in C++ code. It’s like the code couldn’t decide what shape it wanted to be in either language.
The number of threads that can take part is limited by this value:
Right now, this is set to 96, which means that if you need more threads than that, you’ll get a runtime error. This fits nicely with the model FASTER uses of long running threads, but I don’t see how it can play well with actually accepting commands from network / other location.
As part of it’s constructor, this method is called, which actually does the real work of setting up the epoch.
I’m not really sure at this point why it is allocating two additional entries beyond the specified size.
When a thread start running FATER code, it needs to register itself with the Epoch, this is done in the Protect() call.
Going into the Thread class reveals a simple table of values that are used to give ids to the threads that asked to get an id. This is done in this function:
It took me a couple of times of reading the first two lines to understand what is going on here. This is an awesome way to handle a circular buffer scanning. It is very clear and saves a bunch of code ( at the cost of doing mod operation, which can be usually be masked if the value is known at compile time and is a power of 2, which in this case it is not). I’m probably going to use this the next time I need to implement scanning through a ring buffer.
Then we have computing the earliest safe epoch:
The first of these methods is elegant, it does a simple read from the table, reading potentially stale values. This doesn’t matter, because the worst thing that can happen is that we’ll keep a previous epoch for longer than it is required.
The second one reads wrong to me, but I’ll have to dig deeper into the C++ memory model more deeply for this. The problem is that this seems like it is relying on the CPU to update its state somehow. But I don’t see any instruction that would force it to. I think that the set to safe_to_reclaim_epoch (which is std::atomic<uint64_t>) will use memory_order_seq_cst for the operation, but I’m not sure how that would impact reads from the table_.
Also, I want you to pay attention to the variable names here. Private member fields:
Public member fields:
And then we have SpinWaitForSafeToReclaim that uses:
- safe_to_reclaim_epoch – public member field
- safe_to_reclaim_epoch_ – method argument
I’m not sure if this a common practice in C++, but this is really confusing to me. This is enough for now, I’m going to keep going thought the C++ code in my next post. There hasn’t been anything really interesting so far, just digging into the code and getting a feel as to how it is put together.
More posts in "Reviewing FASTER" series:
- (06 Sep 2018) Summary
- (05 Sep 2018) When the data hits the disk
- (04 Sep 2018) Reading data from disk
- (03 Sep 2018) The hash structure
- (31 Aug 2018) Working with the file system
- (30 Aug 2018) Digging into the C++ impl
- (29 Aug 2018) Let’s check these numbers
- (28 Aug 2018) Let’s start with managed code
- (27 Aug 2018) Reading the paper
Loving these posts on your exploration of FASTER.
On thing that confused me reading your posts (and not having the source code myself). The first union is Address, not AtomicAddress. You don't show AtomicAddress which the second snippet applies to. It might help readers to see that too since it should have control_ declared as std::atomic<uint64_t> control_ instead of uint64_t so that the load/store calls make more sense (they don't apply to uint64_t)