Ayende @ Rahien

Refunds available at head office

World’s smallest No SQL Database: Memory

The first item to go over with the World’s smallest No SQL database is how it uses memory. The codebase is really simple, which is pretty much the point, but can you think about the implications of the current codebase with regards to memory?

All the data is kept in a ConcurrentDictionary<string,byte[]>, and there is an unspoken assumption. Both keys and values are assumed to be small. What happens if that isn’t the case? Well, in .NET, you are going to get memory fragmentation pretty quickly. That means that on 32 bits, you are probably going to be able to store a lot less than the 2 GB of virtual memory that you get from the OS.

We will ignore that as well and move to other aspects of the code. What about how the system actually work in production? There is a lot of work that goes into ensuring that the CPU will be able to access memory efficiently. In particular, most dbs uses some form of an in memory data structure that take advantage of the L1, L2, L3 cache of the CPU. Usually something like a B+Tree. How does the dictionary compare here?

For that matter, what happens when we grow really big? What about when we are paging? How would that work? Can we just go over the keys in the dictionary without having to page through the values as well?

And…

I think that you get my drift by now. And this is literally just me throwing stuff off the top of my head. There are a lot of other things that we need to worry about. For example, if we implement the sensible notion of a buffer pool, we might actually have a buffer that was released by a write but is currently being sent to the client, so we can’t just use buffers, we have to use ref counting as well.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Greg Young
07/10/2013 09:14 AM by
Greg Young

And ... what happens when keys don't fit in memory :)

Riak had this problem for a while though I think its resolved now.

Ayende Rahien
07/10/2013 10:27 AM by
Ayende Rahien

Greg, IIRC, you are talking about BitCask, and its limitation to having the entire key space in memory. Usually at that point you spill to disk (paging), that is assuming you are running on x64, on x86, you actually have to worry about running out of the address space.

Comments have been closed on this topic.