Ayende @ Rahien

Refunds available at head office

JAOO: More on Evolving the Key/Value Programming Model to a Higher Level from Billy Newport

As I already mentioned, this presentation had me thinking. Billy presented a system called Redis, which is a Key/Value store which is intended for an attribute based storage.

That means that storing something like User { Id = 123, Name = “billy”, Email = “billy@example.org”} is actually stored as:

{ "uid:123:name": "billy" } 
{ "uid:123:email": "billy@example.org" }
{ "uname:billy": "123" } 

Each of those lines represent a different key/value pair in the Redis store. According to Billy, this has a lot of implications. On the advantage side, you get no schema and very easy support for just adding stuff as you go along. On the other hand, Redis supports not transactions and it is easy to “corrupt” the database during development (usually as a result of a programming bug).

What actually bothered me the most was the implications on the number of remotes calls that are being made. The problem shows itself very well in this code sample, (a twitter clone), which show inserting a new twit:

long postid = R.str_long.incr("nextPostId"); 
long userId = PageUtils.getUserID(request); 
long time = System.currentTimeMillis(); 
string post = Long.toString(userId)+"|" + Long.toString(time)+"|"+status; 
R.c_str_str.set("p:"+Long.toString(postid), post);
List<long> followersList = R.str_long.smembers(Long.toString(userId)+":followers"); 
if(followersList == null) 
   followersList - new ArrayList<Long>(); 
HashSet<long> followerSet = new HashSet<long>(followersList); 
followerSet.add(userid); 
long replyId = PageUtils.isReply(status); 
if(replyId != -1) 
   followerSet.add(replyId); 
for(long i : followerSet) 
    R.str_long.lpush(Long.toString(i)+":posts", postid); 
// -1 uid is global timeline 
String globalKey = Long.toString(-l)+":posts"; 
R.str_long.lpush(globalKey,postid); 
R.str_long.ltrim(globolKey, 200);

I really don’t like the API, mostly because it reminds me of C, but the conventions are pretty easy to figure out.

  • R is the static gateway into the Redis API
  • str_long = store ong
  • c_str_str – store string and keep it in nearby cache

The problem with this type of code is the number of remote calls and the locality of those calls. With a typical sharded set of servers, you are going to have lots of calls going all over the place. And when you get into people that have thousands and millions of followers, the application is simply going to die.

A better solution is required. Billy suggested using async programming or sending code to the data store to execute there.

I have a different variant on the solution.

We will start from the assumption that we really want to reduce remote calls, and that the system performance in the face of large amount of writes (without impacting reads) is important. The benefits of using something like Redis is that it is very easy to get started, very easy to change things around and great for rapid development mode. We want to keep that for now, so I am going to focus on a solution based on the same premise.

The first thing to go is the notion that a key can sit anywhere that it wants. In a key/value store, it is important to be able to control locality of reference. We change the key format so it is now: [server key]@[local key]. What does this mean? It means that for the previously mentioned user, this is the format it will be stored as:

{ "uid:123@name": "billy" } 
{ "uid:123@email": "billy@example.org" }
{ "uname@billy": "123" } 

We use the first part of the key (before the @) to find the appropriate server. This means that everything with a prefix of “uid:123” is known to reside on the same server. This allow you to do things like transactions on a single operation of setting multiple keys.

Once we have that, we can start adding to the API. Instead of getting a single key at a time, we can get a set of values in one remote call. That has the potential of significantly reducing the number of remote calls we will make.

Next, we need to consider repeated operations. By that I mean anything where we have a look in which we call to the store. That is a killer when you are talking about any data of significant size. We need to find a good solution for this.

Billy suggested sending JRuby script to the server (or similar) and executing it there, saving the network roundtrips. Which this is certainly possible, I think it would be a mistake. I have a much simpler solution. Teach the data store about repeated operations. Let us take as a good example the copying that we are doing of a new twit to all your followers. Instead of reading the entire list of followers into memory, and then writing the status to every single one of them, let us do something different:

Redis.PushToAllListFoundIn("uid:"+user_id+"@followers", status, "{0}@posts");

I am using the .NET conventions here because otherwise I would go mad. As you can see, we instruct Redis to go to a particular list, and copy the status that we pass it to all the keys found in the list ( after formatting the key with the pattern). This gives the data store enough information about this to be able to optimize this operation considerably.

With just these few changes, I think that you gain enormously, and you retain the very simple model of using a key/value store.

Comments

neo
10/20/2009 04:49 AM by
neo

neo4j rocks

tobi
10/20/2009 07:10 PM by
tobi

Redis seems to be not well thought out. You could also add batching:

var batch = new RedisBatch();

batch.AddUpdate(...); //repeat for every follower or whatever

batch.Submit();

This seems to be lacking from most K/V stores at the moment although it would be an obvious performance improvement. I estimate that 10 batched updates would be as costly as 2 serial updates. Thats factor 5 with this simple change. It's outrageous.

brad
10/21/2009 01:59 PM by
brad

I think CouchDB has a better solution. Create your data as JSON, and post it to their REST API. Then you can use great tools already available for building the object, like JSon.Net.

Billy Newport
10/22/2009 04:27 AM by
Billy Newport

Good comments, the @ notation is nice. I see the list push and that would work well for that kind of scenario but not for when you want to do more complex stuff with each element of the list which is where I was coming from but your push api is nice and easy.

I'm thinking more and more than KV without first class list/set support is pretty limited and cumbersome to use.

I don't like the API prototype much either and thats why it's a prototype :) More work needed to make it better.

Billy Newport
10/22/2009 04:30 AM by
Billy Newport

In fact, I'm prototyping the @ notation as I speak, it's very cool.

Ezra Zygmuntowicz
10/22/2009 04:43 AM by
Ezra Zygmuntowicz

If you take a look at my ruby client library for redis it implements consistent ring hashing for working with a set of distributed redis servers. I have implemented something very similar to what you mention here with regard to sectioning off the key space so you can be sure a certain set of keys always get hashed to the same redis server with this syntax:

"bob{tags:42}"

"joe{tags:42}"

If you include curly braces in your key, the client library will only hash based on what is inside the {} curly braces in order to find the proper redis server to connect to. This allows for grouping certain data on certain servers so you can perform efficient set operations in the server rather then pulling form many servers and doing it on the client.

Comments have been closed on this topic.