Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,131 | Comments: 45,567

filter by tags archive

World’s smallest No SQL DatabasePersistent data

time to read 3 min | 577 words

The second item to go over with the World’s smallest No SQL database is about persistence, and following that, I/O. Right now, there is no persistence. If you restart the process, all the data is gone. Now, there are actually quite a few real world dbs that behave in this fashion. But they are a rarity. For the most part, if I put data inside a db, I expect it to be there until I do something about it.

And at that point, you are in for quite a bit of complexity. How are you going to persist the data? The easiest way to do it, just create a file per every value in the db is going to be… problematic on most systems. So you need to put a lot of data in a small set of files. Which means that you have to decide how you are going to put the data together. In general, there is either the fixed size option, in which you divide the file(s) into pages and work around that. The good thing about this is that this gives you the ability to reuse space in the file after deletes / updates. The bad thing about that is that it is quite complex. Alternatively, you can just write the data out as needed, but then you can’t really update written data, and would need to run compactions.

And we haven’t talked about searching yet. Some DBs, like Bitcask / Munin, would actually store the keys in memory, and store the position on the disk for retrieving the value. But for the most part, both keys & values tend to be on disk in some form. In CouchDB, they are held inside an append only B+Tree. In LevelDB, they are held in Sorted String Tables. LMDB uses Copy-On-Write B+Tree. Esent use a B+Tree with a Log file.

In each of those cases, the actual semantics for persistent data involve at least two concerns. You need to actually be able to search the data (that usually mean more than just O(1) access, you want to be able to go back & forth on the keys) and you need to be able to do a transactional save. This is so you can recover  in case of a crash, most of the time.

But there are actually a lot more that goes into the selection of the proper persistence format. To start with, how you store the data on disk will have a big effect on your performance. If you store the data as a linked list, for example, you might as well kiss your performance goodbye. Beyond that, we also have issues with things like how is the format going to scale when we have concurrent readers. For example, if you have something that does a lot of seeks, and rely on the seeks always going forward to ensure performance, that is going to be very badly hit the moment that you have concurrent readers doing concurrent reads on different parts of the system. You would be forcing the system to do random seeks.

There are other considerations, as well. For example, if you are using something like B+Tree, it is likely that you’ll be overwriting the same section on the disk multiple times. That is fine with HDD, but SSD would just give up & die on you at some point. And we haven’t started talking about secondary indexes yet…

More posts in "World’s smallest No SQL Database" series:

  1. (16 Aug 2013) Persistent data
  2. (22 Jul 2013) Persistent transaction logs
  3. (18 Jul 2013) ACID & Transactions
  4. (11 Jul 2013) Concurrency
  5. (10 Jul 2013) Memory
  6. (08 Jul 2013) The Devil Is In The Details


Jos van der Til

Just touching on this comment: "That is fine with HDD, but SSD would just give up & die on you at some point."

This is not strictly true, as the SSD will use wear leveling to avoid this. A logical sector does not map 1-to-1 with a physical sector. The SSD will actually spread writes to the same part of the disk across the available space, thus this is not a problem at all.

Ayende Rahien

Jos, You are correct, but that brings back the issues of sequential vs. random writes.

Comment preview

Comments have been closed on this topic.


  1. RavenDB Conference 2016–Slides - 11 hours from now
  2. Proposed solution to the low level interview question - about one day from now

There are posts all the way to Jun 02, 2016


  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats