Rhino Persistent Hash Table

Feb 14 2009

Rhino Persistent Hash Table

time to read 3 min | 472 words

I spoke about the refactoring for Rhino DHT in this post. Now, I want to talking about what I actually did.

Before, Rhino DHT was independent, and had no dependencies outside a Windows machine. But because of the need to support replication, I decided to make some changes to the way it is structured. Furthermore, it became very clear that I am going to want to use the notion of a persisted hash table in more than just the DHT.

The notion of a persisted hash table is very powerful, and one that I already gave up on when I found about Esent. So I decided that it would make sense to make an explicit separation between the notion of the persistent hash table and the notion of distributed hash table.

So I moved all the code relating to Esent into Rhino.PersistentHashTable, while keeping the same semantics as before (allowing conflicts by design, and letting the client resolve them, pervasive read caching) while adding some additional features, like multi value keys (bags of values), which you are quite useful for a number of things.

The end result is a very small interface, which we can use to persist data with as little fuss as you can imagine:

This may seem odd to you at first, why do we need two classes for this? Here is a typical use case for this:

using (var table = new PersistentHashTable(testDatabase))
{
	table.Initialize();

	table.Batch(actions =>
	{
		actions.Put(new PutRequest
		{
			Key = "test",
			ParentVersions = new ValueVersion[0],
			Bytes = new byte[] { 1 }
		});


		var values = actions.Get(new GetRequest { Key = "test" });
                actions.Commit();

		Assert.Equal(1, values[0].Version.Number);
		Assert.Equal(new byte[] { 1 }, values[0].Data);
	});
}

There are a few things to note in this code.

We use Batch as a transaction boundary, and we call commit to well, commit the transaction. That allow us to batch several actions into a single database operation.
Most of the methods access a parameter object. I found that this makes versioning the API much easier.
We have only two set of API that we actually care about:

Get/Put/Remove - for setting a single item in the PHT.
AddItem/RemoveItem/GetItems - for adding, removing or querying lists. Unlike single value items, there is no explicit concurrency control over lists. That is because with lists, there is no concept of concurrency control, since each of the operation can be easily made safe for concurrent use. In fact, that is why the PHT provides an explicit API for this, instead of providing it on top of the single value API.

I am planning on making Rhino PHT the storage engine for persistent data in NH Prof.

Tweet Share Share 8 comments

Tags:

Rhino DHT

Comments

15 Feb 2009
02:34 AM

What would you say the major differences are between Rhino DHT and MS velocity, ncache or shared cache. Is it in anything resembling a production usuable state and if not how long do you think it will be until it is?

15 Feb 2009
03:00 AM

Ayende Rahien

pb,

Rhino DHT is not a cache.

It can be used as one, most certainly, but it isn't one.

The data is persistent, replicated and fail over.

And yes, I am currently going with it to production.

27 Feb 2009
16:12 PM

zfg

Hi Ayende,

could you please explain what the following warning is all about:

"Warning: 0 : Non-finalized ESENT resource Microsoft.Isam.Esent.Interop.Table"

We at our company get this warning during our unit test runs and have not been able to find any explanations in a well-known search engine.

27 Feb 2009
21:19 PM

Ayende Rahien

zfg,

A bug, you are not disposing the table.

02 Mar 2009
12:56 PM

zfg

Ayende,

firstly, thank you for your reply.

One thing I should have mentioned in my previous post is that I am already (trying) to dispose the table. I am only using the table inside "using"-statements, so that the IDisposable mechanism kicks in. I also debugged into your overridden Dispose method. So everything seems to be working as intended. Firing up Reflector and having a look at Esent.Interop did show me where the TraceWarning is generated, but did not solve the problem.

As I do not think that neither you nor me did write incorrect code (I actually started on this Esent spike in our code using the same syntax like you did in your unit-tests, which - I'm sorry to say - also partly fail on my developer machine with several 'Key duplicates' errors), I'm starting to blame Esent, since the JETTerm call has no effect on the hasResource property on Esent.Interop's side.

Lastly, this is no request to you, but if you have any idea, why the EsentResource does not dispose properly (even though the EventLog shows me that Esent instances are getting started/stopped and the database engine is also getting started/shutdown), I'd be very happy to know. On the other side, I may not be using rhino-pht because of the instable behaviour of the Esent (some tests using the same syntax do not trace the warning, even though they are testing the same class and so on), which is kind of sad, since I already thought about moving our project from NServiceBus to Rhino.ServiceBus (which would better match our needs) and we're already using Rhino.Commons and (yes, sadly because we cannot use MVC :( ) Rhino.Igloo for round about one year now.

02 Mar 2009
16:37 PM

Ayende Rahien

zfg,

what version are you working on?

something that I found useful is to change ManagedEsent and make it add additional trace information, such as what table it not being disposed.

11 Mar 2009
15:53 PM

Adam

Ayende,

Where/what is the DHT stored in?

11 Mar 2009
16:01 PM

Ayende Rahien

Esent

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB