This is how leveldb starts up a new db:
As you can imagine, this is quite useful to find out, since it means that everything we do on startup is recover. I do wonder why we have to take a lock immediately, though. I don't imagine that anything else can try to make use of the db yet, it hasn't been published to the outside world.
Recover is interesting (and yes, I know I wrote that a lot lately).
- It starts by taking a lock on the db (File System lock, by holding to LOCK file).
- If the db does not exists, we call NewDB(), which will create a default MANIFEST-00001 file and a CURRENT file pointing at that.
- If the db exists, we call VersionSet::Recover(), this starts by reading the CURRENT file, which points to the appropriate MANIFEST file.
- This then gives us the latest status of the db versions, in particular, it tells us what files belong to what levels and what they ranges are.
- We check that the comparators we use is identical to the one used when creating the db.
- The code make it looks like there might be multiple version records in the MANIFEST file, but I don't recall that. It might be just the way the API is structured, though. I just checked with the part that write it, and yes, it should have just one entry.
- Once we have our versions, what next? We check that all of the expected files are actually there, because otherwise we might have a corrupted install.
- The final thing we do is look for log files that are later than the latest we have in the MANIFEST. I am not sure if this indicates a recover from a crash or a way to be compatible with an older version. I guess that one way this can happen is if you crashed midway while committing a transaction.
When recovering, we are forcing checksum checks, to make sure that we don't get corrupt data (which might very well be the case, since the data can just stop at any point here. The relevant code here is leveldb::log:Reader, which takes care of reading potentially corrupt log file and reporting on its finding. I already went over how the log file is structured, the log reader just does the same thing in reverse, with a lot of paranoia. While reading from the log file, we build a memtable with the committed & safe changes. Here, too, we need to handle with memtable sizes, so we might generate multiple level 0 files during this process.
And... that is pretty much it.
I'll have another post summarizing this, and maybe another on what I intend to do with this information, but I'll keep that close to the chest for now.
More posts in "Reviewing LevelDB" series:
- (26 Apr 2013) Part XVIII–Summary
- (15 Apr 2013) Part XVII– Filters? What filters? Oh, those filters…
- (12 Apr 2013) Part XV–MemTables gets compacted too
- (11 Apr 2013) Part XVI–Recovery ain’t so tough?
- (10 Apr 2013) Part XIV– there is the mem table and then there is the immutable memtable
- (09 Apr 2013) Part XIII–Smile, and here is your snapshot
- (08 Apr 2013) Part XII–Reading an SST
- (05 Apr 2013) Part XI–Reading from Sort String Tables via the TableCache
- (04 Apr 2013) Part X–table building is all fun and games until…
- (03 Apr 2013) Part IX- Compaction is the new black
- (02 Apr 2013) Part VIII–What are the levels all about?
- (29 Mar 2013) Part VII–The version is where the levels are
- (28 Mar 2013) Part VI, the Log is base for Atomicity
- (27 Mar 2013) Part V, into the MemTables we go
- (26 Mar 2013) Part IV
- (22 Mar 2013) Part III, WriteBatch isn’t what you think it is
- (21 Mar 2013) Part II, Put some data on the disk, dude
- (20 Mar 2013) Part I, What is this all about?