Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,026 | Comments: 44,842

filter by tags archive

Reviewing LevelDBPart XVI–Recovery ain’t so tough?

time to read 3 min | 500 words

This is how leveldb starts up a new db:


As you can imagine, this is quite useful to find out, since it means that everything we do on startup is recover. I do wonder why we have to take a lock immediately, though. I don't imagine that anything else can try to make use of the db yet, it hasn't been published to the outside world.

Recover is interesting (and yes, I know I wrote that a lot lately).

  • It starts by taking a lock on the db (File System lock, by holding to LOCK file).
  • If the db does not exists, we call NewDB(), which will create a default MANIFEST-00001 file and a CURRENT file pointing at that.
  • If the db exists, we call VersionSet::Recover(), this starts by reading the CURRENT file, which points to the appropriate MANIFEST file.
  • This then gives us the latest status of the db versions, in particular, it tells us what files belong to what levels and what they ranges are.
  • We check that the comparators we use is identical to the one used when creating the db.
  • The code make it looks like there might be multiple version records in the MANIFEST file, but I don't recall that. It might be just the way the API is structured, though. I just checked with the part that write it, and yes, it should have just one entry.
  • Once we have our versions, what next? We check that all of the expected files are actually there, because otherwise we might have a corrupted install.
  • The final thing we do is look for log files that are later than the latest we have in the MANIFEST. I am not sure if this indicates a recover from a crash or a way to be compatible with an older version. I guess that one way this can happen is if you crashed midway while committing a transaction.

When recovering, we are forcing checksum checks, to make sure that we don't get corrupt data (which might very well be the case, since the data can just stop at any point here. The relevant code here is leveldb::log:Reader, which takes care of reading potentially corrupt log file and reporting on its finding. I already went over how the log file is structured, the log reader just does the same thing in reverse, with a lot of paranoia. While reading from the log file, we build a memtable with the committed & safe changes. Here, too, we need to handle with memtable sizes, so we might generate multiple level 0 files during this process.

And... that is pretty much it.

I'll have another post summarizing this, and maybe another on what I intend to do with this information, but I'll keep that close to the chest for now.

More posts in "Reviewing LevelDB" series:

  1. (26 Apr 2013) Part XVIII–Summary
  2. (15 Apr 2013) Part XVII– Filters? What filters? Oh, those filters…
  3. (12 Apr 2013) Part XV–MemTables gets compacted too
  4. (11 Apr 2013) Part XVI–Recovery ain’t so tough?
  5. (10 Apr 2013) Part XIV– there is the mem table and then there is the immutable memtable
  6. (09 Apr 2013) Part XIII–Smile, and here is your snapshot
  7. (08 Apr 2013) Part XII–Reading an SST
  8. (05 Apr 2013) Part XI–Reading from Sort String Tables via the TableCache
  9. (04 Apr 2013) Part X–table building is all fun and games until…
  10. (03 Apr 2013) Part IX- Compaction is the new black
  11. (02 Apr 2013) Part VIII–What are the levels all about?
  12. (29 Mar 2013) Part VII–The version is where the levels are
  13. (28 Mar 2013) Part VI, the Log is base for Atomicity
  14. (27 Mar 2013) Part V, into the MemTables we go
  15. (26 Mar 2013) Part IV
  16. (22 Mar 2013) Part III, WriteBatch isn’t what you think it is
  17. (21 Mar 2013) Part II, Put some data on the disk, dude
  18. (20 Mar 2013) Part I, What is this all about?


Blair Davidson

It is unfortunate the code doesnt follow the RAII pattern as the lock is not removed if an exception is thrown so it is not safe by default.


Comment preview

Comments have been closed on this topic.


No future posts left, oh my!


  1. Technical observations from my wife (3):
    13 Nov 2015 - Production issues
  2. Production postmortem (13):
    13 Nov 2015 - The case of the “it is slow on that machine (only)”
  3. Speaking (5):
    09 Nov 2015 - Community talk in Kiev, Ukraine–What does it take to be a good developer
  4. Find the bug (5):
    11 Sep 2015 - The concurrent memory buster
  5. Buffer allocation strategies (3):
    09 Sep 2015 - Bad usage patterns
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats