Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,953 | Comments: 44,409

filter by tags archive

Reviewing LevelDBPart XVI–Recovery ain’t so tough?


This is how leveldb starts up a new db:

image

As you can imagine, this is quite useful to find out, since it means that everything we do on startup is recover. I do wonder why we have to take a lock immediately, though. I don't imagine that anything else can try to make use of the db yet, it hasn't been published to the outside world.

Recover is interesting (and yes, I know I wrote that a lot lately).

  • It starts by taking a lock on the db (File System lock, by holding to LOCK file).
  • If the db does not exists, we call NewDB(), which will create a default MANIFEST-00001 file and a CURRENT file pointing at that.
  • If the db exists, we call VersionSet::Recover(), this starts by reading the CURRENT file, which points to the appropriate MANIFEST file.
  • This then gives us the latest status of the db versions, in particular, it tells us what files belong to what levels and what they ranges are.
  • We check that the comparators we use is identical to the one used when creating the db.
  • The code make it looks like there might be multiple version records in the MANIFEST file, but I don't recall that. It might be just the way the API is structured, though. I just checked with the part that write it, and yes, it should have just one entry.
  • Once we have our versions, what next? We check that all of the expected files are actually there, because otherwise we might have a corrupted install.
  • The final thing we do is look for log files that are later than the latest we have in the MANIFEST. I am not sure if this indicates a recover from a crash or a way to be compatible with an older version. I guess that one way this can happen is if you crashed midway while committing a transaction.

When recovering, we are forcing checksum checks, to make sure that we don't get corrupt data (which might very well be the case, since the data can just stop at any point here. The relevant code here is leveldb::log:Reader, which takes care of reading potentially corrupt log file and reporting on its finding. I already went over how the log file is structured, the log reader just does the same thing in reverse, with a lot of paranoia. While reading from the log file, we build a memtable with the committed & safe changes. Here, too, we need to handle with memtable sizes, so we might generate multiple level 0 files during this process.

And... that is pretty much it.

I'll have another post summarizing this, and maybe another on what I intend to do with this information, but I'll keep that close to the chest for now.

More posts in "Reviewing LevelDB" series:

  1. (26 Apr 2013) Part XVIII–Summary
  2. (15 Apr 2013) Part XVII– Filters? What filters? Oh, those filters…
  3. (12 Apr 2013) Part XV–MemTables gets compacted too
  4. (11 Apr 2013) Part XVI–Recovery ain’t so tough?
  5. (10 Apr 2013) Part XIV– there is the mem table and then there is the immutable memtable
  6. (09 Apr 2013) Part XIII–Smile, and here is your snapshot
  7. (05 Apr 2013) Part XI–Reading from Sort String Tables via the TableCache
  8. (04 Apr 2013) Part X–table building is all fun and games until…
  9. (02 Apr 2013) Part VIII–What are the levels all about?
  10. (29 Mar 2013) Part VII–The version is where the levels are
  11. (28 Mar 2013) Part VI, the Log is base for Atomicity
  12. (27 Mar 2013) Part V, into the MemTables we go
  13. (26 Mar 2013) Part IV
  14. (22 Mar 2013) Part III, WriteBatch isn’t what you think it is
  15. (21 Mar 2013) Part II, Put some data on the disk, dude

Comments

Blair Davidson

It is unfortunate the code doesnt follow the RAII pattern as the lock is not removed if an exception is thrown so it is not safe by default.

Blair

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats