The Guts n’ Glory of Database Internals: The LSM option
So far, we looked at naïve options for data storage and for building indexes, and we found them lacking. The amount of complexity involved was just too much, and the performance costs were not conductive for good business.
In this post, I want to explore the Log Structure Merge option (LSM). This is a pretty simple solution. Our file format remains pretty simple. It is just a flat list of records, but we add a very small twist. For each collection of data (we can call it a table, an index, or whatever), all the records are going to be sorted inside that file based on some criteria.
In other words, here is our file again:
But what about updates? As we mentioned, adding a user with the username ‘baseball’ will force us to move quite a lot of data. Well, the answer to that is that we are not going to modify the existing file. Indeed, in LSM, once a file has been written out, it can never be changed again. Instead, we are going to create a new file, with the new information.
When we query, we’ll search the files in descending order, so newer files are checked first. That allows us to see the updated information. Such a system also rely on tombstone markers to delete values, and it is even possible to run range searches by scanning multiple files (merge sorting on the fly). Of course, over time, the number of files you are using is going to increases, so any LSM solution also has a merge phase (it is right there in the name), where the data among many files is merged together.
This lead to some interesting challenges. Scanning a file to see if a value is there can be expensive (seeks, again), so we typically will use something like a bloom filter to skip that if possible. Merging files is expensive (a lot of I/O), so we want to be sure that we aren’t doing that too often, and yet not doing that means that we have to do a lot more operations, so there are a lot of heuristics involved.
LSM can be a particularly good solution for certain kinds of data stores. Lucene is actually able to do significant optimizations in the way it works as a result of LSM, because it clears internal data structures during the merge operation. Other databases which uses LSM are LevelDB, RocksDB, Cassandra, etc.
Personally, I don’t like LSM solutions very much, it seems that in pretty much any such solution I saw, the merge heuristics were incredibly capable of schedule expensive merges just when I didn’t want them to do anything. And there is quite a bit of complexity involved with managing potentially large number of files. There is also another issue, it is pretty hard to have physical separation of the data using LSM, you typically have to use separate file for each, which also doesn’t help very much.
A much more elegant solution in my view is the B+Tree, but I’ll keep that for the next post.