So far, we looked at naïve options for data storage and for building indexes, and we found them lacking. The amount of complexity involved was just too much, and the performance costs were not conductive for good business.
In this post, I want to explore the Log Structure Merge option (LSM). This is a pretty simple solution. Our file format remains pretty simple. It is just a flat list of records, but we add a very small twist. For each collection of data (we can call it a table, an index, or whatever), all the records are going to be sorted inside that file based on some criteria.
In other words, here is our file again:
But what about updates? As we mentioned, adding a user with the username ‘baseball’ will force us to move quite a lot of data. Well, the answer to that is that we are not going to modify the existing file. Indeed, in LSM, once a file has been written out, it can never be changed again. Instead, we are going to create a new file, with the new information.
When we query, we’ll search the files in descending order, so newer files are checked first. That allows us to see the updated information. Such a system also rely on tombstone markers to delete values, and it is even possible to run range searches by scanning multiple files (merge sorting on the fly). Of course, over time, the number of files you are using is going to increases, so any LSM solution also has a merge phase (it is right there in the name), where the data among many files is merged together.
This lead to some interesting challenges. Scanning a file to see if a value is there can be expensive (seeks, again), so we typically will use something like a bloom filter to skip that if possible. Merging files is expensive (a lot of I/O), so we want to be sure that we aren’t doing that too often, and yet not doing that means that we have to do a lot more operations, so there are a lot of heuristics involved.
LSM can be a particularly good solution for certain kinds of data stores. Lucene is actually able to do significant optimizations in the way it works as a result of LSM, because it clears internal data structures during the merge operation. Other databases which uses LSM are LevelDB, RocksDB, Cassandra, etc.
Personally, I don’t like LSM solutions very much, it seems that in pretty much any such solution I saw, the merge heuristics were incredibly capable of schedule expensive merges just when I didn’t want them to do anything. And there is quite a bit of complexity involved with managing potentially large number of files. There is also another issue, it is pretty hard to have physical separation of the data using LSM, you typically have to use separate file for each, which also doesn’t help very much.
A much more elegant solution in my view is the B+Tree, but I’ll keep that for the next post.
More posts in "The Guts n’ Glory of Database Internals" series:
- (08 Aug 2016) Early lock release
- (05 Aug 2016) Merging transactions
- (03 Aug 2016) Log shipping and point in time recovery
- (02 Aug 2016) What goes inside the transaction journal
- (18 Jul 2016) What the disk can do for you
- (15 Jul 2016) The curse of old age…
- (14 Jul 2016) Backup, restore and the environment…
- (11 Jul 2016) The communication protocol
- (08 Jul 2016) The enemy of thy database is…
- (07 Jul 2016) Writing to a data file
- (06 Jul 2016) Getting durable, faster
- (01 Jul 2016) Durability in the real world
- (30 Jun 2016) Understanding durability with hard disks
- (29 Jun 2016) Managing concurrency
- (28 Jun 2016) Managing records
- (16 Jun 2016) Seeing the forest for the trees
- (14 Jun 2016) B+Tree
- (09 Jun 2016) The LSM option
- (08 Jun 2016) Searching information and file format
- (07 Jun 2016) Persisting information