Patterns for using distributed hash tables: Groups

Jul 20 2008

Patterns for using distributed hash tablesGroups

time to read 6 min | 1064 words

Yesterday's post called them distributed in memory cache / storage, but I was reminded that the proper term for what I am talking about is distributed hash tables (DHT).

I presented the problem of dealing with DHT in this post, mainly, the fact that we have only key based access and no way to compose several actions into a single transaction. I'll let you go read that post for all the gory details, and continue on with some useful patterns for dealing with this issue.

As a reminder, here is the API that we have:

SET key, data, expiration
Set the key value pair in the DHT
PUT key, data, expiration
Will fail if item is already in DHT
GET key
Will return null if item is not in DHT or if expired
DEL key
Delete the key from the DHT
UPDATE key, data, version
Update the item if the version matches

Now, let us explore the some of the more useful patterns we need to deal with. We will start with grouping of items.

One of the more annoying properties of key based access methods is... key based access. We have no way to perform a query (well, this is actually inaccurate, but we will touch it later).

Let us assume that we want to get the recent posts from the cache. How are we going to do that? Using SQL, it is very easy:

SELECT TOP 10 * FROM News ORDER BY PublishedDate DESC

We don't have anything similar using the API above. What we have instead is key based access, and it turn out we can use that to create groups, which are very useful.

A group is simple a key value pair where the value contains a list of keys. In the case of the query above, we can represent it using:

newsItems = List()
foreach news_id in GET("most recent news"):
	item = GET(news_id)
	continue if item is null // deleted news item
	newsItmes.Add(item)

The group itself would be:

{
  "most recent news" :
	[
		"NewsItem#15",
		"NewsItem#16",
		"NewsItem#17"
	]
}

But wait, why do we have to have groups, don't we violate a basic law of distributed work? SELECT N+1 ? Remember, each GET is a remote call. In this case, we make 4 remote calls, but what if we have 100 items in the recent news?

A better approach for this would be to put the item data in the group directly, like this:

{ 
	"most recent news" : 
		[ 
			"astounding occurrences in...",
			"amazing revelations...",
			"you are feeling sleepy..."
		]
}

Here, we have a single remote call, and we get all the data that we need. Much better, isn't it?

Well, sort of. The problem that we face now is that now we need to track what we are doing with regards to the most recent news. Let us say that we need to update an article. We most certainly want this update to be reflected in the most recent news as well. But in order to do that, we need to track that we put the article in the most recent news, and extending this sample a bit toward other groups that we may have in our applications, we can see that this quickly becomes an unmanageable problem.

By storing just the keys to other items in the list, we can now handle updates by updating a single key value pair, instead of updating a lot of separate items. In fact, this problem has a name, normalization / de-normalization. The tradeoffs of those are very well understood.

Actually, I wasn't as honest with you as I could be. The worst case scenario for this approach is not N + 1, it is (N modulus M) +1, where N is the number of items in the list, and M is the number of nodes in the DHT. The reason for that is that all DHT support some form of batch GET, so the following piece of pseudo code is a generic way to get a group from a DHT in as performant way as possible.

def GetGroup(groupName as string):
	items = GET(groupName)
	return EmptyList if items is null
	itemsByNode = GroupKeysByNode(items)
	futures = List()
	for kvp in ItemsByNode:
		futureGet = GET_ASYNC(kvp.Node, kvp.Items.ToArray())
		futures.Add(futureGet)
	WaitForAll(futures)
	results = List()
	for futureGet in futures:
		for item in futureGet.Results:
			results.Add(item) if item is not null
	return results

There are other optimizations we can make, but this one hits about as many of them as we can hope to without becoming very complex.

Now, remember the problem statement in the first post in this series, we have multiple clients working against the DHT in parallel. Let us assume that we have two clients that want to add an item to the most recent items. We want to preserve data integrity, so this is something that we need to think about. The way this is done is very simple:

while true:
	mostRecentItems, mostRecentItemsVersion = GetGroupWithVersion("most recent items")
	mostRecentItems.Add( GenerateCacheId(newItem) )
	result = UPDATE("most recent items", mostRecentItems, mostRecentItemsVersion)
	break if result is SuccessfulUpdate

Basically, we make a compare and exchange call, so if someone else snuck in and update the recent items group while we were doing it, we will simply fetch the list again and retry. A classic case of optimistic concurrency.

The problem occurs when we need to ensure mutual updates to two or more groups (or just two or more items in the DHT). Now, just using optimistic concurrency will not save us, because we don't have a way to control updates over two items (which may very well reside on different nodes). This will be the subject of my next post.

Tweet Share Share 6 comments

Tags:

More posts in "Patterns for using distributed hash tables" series:

(09 Aug 2008) Conclusion
(20 Jul 2008) Locking
(20 Jul 2008) Groups

Comments

20 Jul 2008
18:37 PM

firefly

Is it safe to assume that there is a lock inside the that ensure nobody can update the item while mostRecentItemsVersion is being checked out?

20 Jul 2008
18:43 PM

Ayende Rahien

firefly,

You are getting a copy of the item, not the actual item itself.

This is your own private copy, no need for locking

20 Jul 2008
18:55 PM

firefly

Oren,

What I mean is inside the UPDATE method

Let say we both update at the same time. You checked out version 2.1 and so do I. Now you update first the most recent version is now 2.2. As far as I am concern we are still 2.1 and I override your Unless we set a mark on the item say hey this item is about to be update don't touch it until I am done.

20 Jul 2008
19:00 PM

Ayende Rahien

UPDATE will fail unless the version is matched.

So, you get version 2, do your thing, and try to update it.

In the meantime, someone else updated it to version 3.

Now, you update will fail, you get the version 3, update it, and then try again.

Now you are successful of updating from version 3 to 4

20 Jul 2008
19:24 PM

firefly

I was talking about the case of two Update running concurrently. If there isn't a readonly bit that say don't update it. Since we both update at roughly the same time. So you check the version it's still 2.1 and same goes for me. So the update will goes through and the last person will override the first. And we still on version 3 where it should be on 4.

20 Jul 2008
19:37 PM

Ayende Rahien

Yes, the update implementation should be thread safe.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB