Building a phone bookPart III

time to read 2 min | 337 words

In the past two posts in the series, I talked about ways to store phone book records in a file. During the candidates review process, I noticed that many candidates failed to make their lives significantly easier by placing limits on themselves.

For example:

  • Using variable length records.
  • Using a single file.
  • Choosing simple algorithm to do the whole task.

If we force fixed length records, either directly or via record splitting (if each record is 64 bytes, a record that is bigger than that would reside in some multiple of the record size), the task become much easier. I’ve mostly ignored that in my code so far because I’m using binary offsets, but it can really make the code a lot simpler.

Using a single file lead to complications, because you have to do internal space management (where do the records live, where is the metadata?). It also make it much harder to recover used space in many cases.

The last one is probably the most interesting limitation, and not something that I would expect a junior developer to figure out. The use of a single option is typically limiting you to whatever a particular algorithm is providing, but you can extend on that significantly.

Let’s see another approach to building a persistent phone book. I’m going to effectively build an LSM here. You can see the code here.

I called it a pretty horrible LSM (Log Structure Merge), but all the relevant pieces are there. It is just horribly inefficient. The key problem, by the way, is around the number of times it will open a file handle. That can be really slow on Windows and end up being a real issue for any significant size.

There are also probably a lot of other bugs there, but also enough to figure out how this is actually built.

And with this post, I can can say that I explicitly scratched this itch.

A fun task to take this further, by the way, is to try to implement a persistent trie for the phone book.

More posts in "Building a phone book" series:

  1. (02 Apr 2021) Part III
  2. (29 Mar 2021) Part II
  3. (26 Mar 2021) Part I