The naïve overflow handling I wrote previously kept me up at night. I really don’t like it. I finally figured out what I could do to handle this in an elegant fashion.
The idea is to:
- Find the furthest non overflow piece from the current one.
- Read its keys and try to assign them to its natural location.
- If successfully moved all non native keys, mark the previous piece as non overlapping.
- Go back to the previous piece and do it all over again.
Maybe it will be better to look at it in code?
There is quite a lot that is going on here, to be frank. We call this method after we deleted a value and go a piece to be completely empty. At this point, we scan the next pieces to see how far we have to go to find the overflow chain. We then proceed from the end of the chain backward. We try to move all the keys in the piece that aren’t native to the piece to their proper place. If we are successful, we mark the previous piece as non overflowing, and then go back one piece and continue working.
I intentionally scan more pieces than the usual 16 limit we use for put, because I want to reduce overflows as much as possible (to improve lookup times). To reduce the search costs, we only search within the current chain, and I know that the worst case scenario for that is 29 in truly random cases.
This should do amortize the cost of fixing the overflows on deletes to a high degree, I hope.
Next, we need to figure out what to do about compaction. Given that we are already doing some deletion book keeping when we clear a piece, I’m going to also do compaction only when a piece is emptied. For that matter, I think it make sense to only do a page level compaction attempt when the piece we just cleared is still empty after an overflow merge attempt. Here is the logic:
Page compaction is done by finding a page’s sibling and seeing if we can merge them together. A sibling page is the page that share the same key prefix with the current page except a single bit. We need to check that we can actually do the compaction, which means that there is enough leaf pages, that the sizes of the two pages are small enough, etc. There are a lot of scenarios we are handling in this code. We verify that even if we have enough space theoretically, the keys distribution may cause us to avoid doing this merge.
Finally, we need to handle the most complex parts. We re-assign the buckets in the hash, then we see if we can reduce the number of buckets and eventually the amount of memory that the directory takes. The code isn’t trivial, but it isn’t really complex, just doing a lot of things:
With this, I think that I tackled the most complex pieces of this data structure. I wrote the code in C because it is fun to get out and do things in another environment. I’m pretty sure that there are bugs galore in the implementation, but that is a good enough proof of concept to do everything that I wanted it to do.
However, writing this in C, there is one thing that I didn’t handle, actually destroying the hash table. As it turns out, this is actually tricky, I’ll handle that in my next post.