Building data stores – Append Only
One of the interesting aspects in building a data store is that you run head on into things that you would generally leave to the infrastructure. By far, most developers deal with concurrency by relegating that responsibility to a database.
When you write your own database, you have to build this sort of thing. In essence, we have two separate issues here:
- Maximizing Concurrency – does readers wait for writers? does writers wait for readers? does writers wait for writers?
- Ensuring Consistency – can I read uncommitted data? can I read partially written data?
As I mentioned in my previous post, there are two major options when building a data store, Transaction Log & Append Only. There are probably a better name for each, but that is how I know them.
This post is going to focus on append only. An append only store is very simple idea in both concept and implementation. It requires that you will always append to the file. It makes things a bit finicky with the type of data structures that you have to use, since typical persistent data structures rely on being able to modify data on the disk. But once you get over that issue, it is actually very simple.
An append only store works in the following manner:
- On startup, the data is read in reverse, trying to find the last committed transaction.
- That committed transaction contains pointers to locations in the file where the actual data is stored.
- A crash in the middle of a write just means garbage at the end of the file that you have to skip when finding the last committed transaction.
- In memory, the only thing that you have to keep is just the last committed transaction.
- A reader with a copy of the last committed transaction can execute independently of any other reader / writer. It will not see any changes made by writers made after it started, but it also doesn’t have to wait for any writers.
- Concurrency control is simple:
- Readers don’t wait for readers
- Readers don’t wait for writers
- Writers don’t wait for readers
- There can be only one concurrent writer
The last one is a natural fallout from the fact that we use the append only model. Only one thread can write to the end of the file at a given point in time. That is actually a performance boost, and not something that would slow the database down, as you might expect.
The reason for that is pretty obvious, once you start thinking about it. Writing to disk is a physical action, and the head can be only in a single place at any given point in time. By ensuring that all writes go to the end of the file, we gain a big perf advantage since we don’t do any seeks.