In the previous post, I discussed how I can (fairly naively) solve the phone book problem. In this post, I want to take this a bit further. Whenever a developer hears the terms sorted or ordered, I expect them to think about tree. Indeed, trees are the very first thing that I would expect to pop to mind.
Assuming that they aren’t versed with persistent data structures, they are likely going to look at in memory trees and map the idea directly to a file. I decided to take the same approach. For algorithms, I find Python to be the best language for me to grok. Mostly because it looks so much like pseudo code. Searching for AVLTree Python got me to this implementation. One issue with AVLTrees is their code size. Even in Python, the code is about 200 lines of code. Here is the structure of a node, which we’ll need to store on the disk.
You can see the full code here. It is about twice as long as the previous implementation (around 300 lines of code). I’m not going to cover that in depth, mostly because this is an AVL tree, with the only funny thing here is that I’m using that I’m writing the nodes to the file, not holding them in memory.
For example, I have to have some way to find the root node. I do that by writing its position to the end of the file after each write. That means that there are some limits to what I’m doing, but nothing too bad.
I don’t have a way to recover disk space, and updates to the data will use new space, not the old one. That is because we have to take into account that the size of the data may change.
This implementation is also going to be quite wasteful in terms of the disk seeks, given that it is an AVL Tree with a branching factor of 2. One of the reasons that binary search trees aren’t really used with persistent data structures is that the cost of seeking to another location in the file is enormous. B+Tree solves the problem by having a much higher branching factor.
A proper B+Tree, however, is probably going to take about 1,500 lines of code to implement, I think. So if you want to read that code, go ahead and peek into Voron .