Ayende @ Rahien

Refunds available at head office

That No SQL Thing - Key/Value stores

The simplest No SQL databases are the Key/Value stores. They are simplest only in terms of their API, because the actual implementation may be quite complex. But let us focus on the API that is exposed to us for a second. Most of the Key/Value stores expose some variation on the following API:

void Put(string key, byte[] data);
byte[] Get(string key);
void Remove(string key);

There are many variations, but that is the basis for everything else. A key value store allows you to store values by key, as simple as that. The value itself is just a blob, as far as the data store is concerned, it just stores it, it doesn’t actually care about the content. In other words, we don’t have a data stored defined schema, but a client defined semantics for understanding what the values are. The benefits of using this approach is that it is very simple to build a key value store, and that it is very easy to scale it. It also tend to have great performance, because the access pattern in key value store can be heavily optimized.

Concurrency – In Key/Value Store, concurrency is only applicable on a single key, and it is usually offered as either optimistic writes or as eventually consistent. In highly scalable systems, optimistic writes are often not possible, because of the cost of verifying that the value haven’t changed (assuming the value may have replicated to other machines), there for, we usually see either a key master (one machine own a key) or the eventual consistency model.

Queries – there really isn’t any way to perform a query in a key value store, except by the key. Even range queries on the key are usually not possible.

Transactions – while it is possible to offer transaction guarantees in a key value store, those are usually only offer in the context of a single key put. It is possible to offer those on multiple keys, but that really doesn’t work when you start thinking about a distributed key value store, where different keys may reside on different machines. Some data stores offer no transaction guarantees.

Schema – key value stores have the following schema Key is a string, Value is a blob :-) beyond that, the client is the one that determines how to parse the data.

Scaling Up – In Key Value stores, there are two major options for scaling, the simplest one would be to shard the entire key space. That means that keys starting in A go to one server, while keys starting with B go to another server. In this system, a key is only stored on a single server. That drastically simplify things like transactions guarantees, but it expose the system for data loss if a single server goes down. At this point, we introduce replication.

Replication – In key value stores, the replication can be done by the store itself or by the client (writing to multiple servers). Replication also introduce the problem of divergent versions. In other words, two servers in the same cluster think that the value of key ‘ABC’ are two different things. Resolving that is a complex issue, the common approaches are to decide that it can’t happen (Scalaris) and reject updates where we can’t ensure non conflict or to accept all updates and ask the client to resolve them for us at a later date (Amazon Dynamo, Rhino DHT).

Usages – Key Value stores shine when you need to access the data by key :-)

More seriously, key based access is actually quite common. Things like user profiles, user sessions, shopping carts, etc. In all those cases, note, we are storing the entire thing as a single value in the data store, that makes it cheap to handle (one request to read, one request to write) easy to handle when you run into concurrency conflict (you only need to resolve a single key).

Because key based queries are practically free, by structuring our data access along keys, we can get significant performance benefit by structuring our applications to fit that need. It turns out that there is quite a lot that you can do with just key/value store. Amazon’s shopping cart runs on a key value store (Amazon Dynamo), so I think you can surmise that this is a highly scalable technique.

Data stores to look at:

  • The Amazon Dynamo paper is one of the best resources on the topic that one can ask.
  • Rhino DHT is a scalable, redundant, zero config, key value store on the .NET platform.

Comments

Ayende Rahien
03/29/2010 01:21 PM by
Ayende Rahien

Tomnaso,

That is an embedded solution, what I would use to build a distributed key value store backend

Demis Bellot
03/29/2010 01:40 PM by
Demis Bellot

Note: FYI Redis does support atomic transactions:

code.google.com/p/redis/wiki/MultiExecCommand

Otherwise great post Oren, I would expand a little on queries which is one of the biggest limitations in NoSql databases. To support queries you effectively 'think in reverse', i.e. you need to know what queries your system will need up-front, then have your writes update these 'redundant query indexes'.

On NoSql Schema's fundamentally they all implement one big hash table, but there are NoSql variants like Redis which also support atomic server-side operations on rich data constructs like Lists, Sets, Sorted Sets and Hashes which are useful in a variety of programming problems.

Ayende Rahien
03/29/2010 03:46 PM by
Ayende Rahien

Thanks,

fixed the issue with Redis

Fabio Maulo
03/30/2010 06:29 AM by
Fabio Maulo

Data mining or, for the example of "shopping-cart", statistics of sold products ?

dz
04/02/2010 07:21 AM by
dz

I'm wondering, where do solutions like Solr fit in between SQL and NoSQL war.

Ayende Rahien
04/04/2010 08:47 AM by
Ayende Rahien

Solr / Lucene are usually used as external indexes for NoSQL stores

Comments have been closed on this topic.