NHibernate Shards: Progress Report

time to read 9 min | 1774 words

Since my last post about it, there has been a lot of changes to NHibernate Shards.

Update: I can’t believe I forgot, I was so caught up in how cool this was that I did give the proper credits. Thanks to Dario Quintana and all the other contributors to NHibernate Shards.

The demo actually works :-) You can look at the latest code here: http://nhcontrib.svn.sourceforge.net/svnroot/nhcontrib/trunk/src/NHibernate.Shards/

You can read the documentation for the Java version (most of which is applicable for the .NET version) here: http://docs.jboss.org/hibernate/stable/shards/reference/en/html/

Let us go through how it works, okay?

We have the following class, which we want to shard.

image

The class mapping is almost standard:

image

As you can see, the only new thing is the primary key generator. Because entities are sharded based  on their primary key, we have to encode the appropriate shard in the shard. The easiest way of doing that is using the SharedUUIDGenerator. This generator generates keys that looks like this:

  • 00010000602647468c2ef2f10ded039a
  • 000200006ba74626a564d147dc89f9ad
  • 00030000eb934532b828601979036e3c

The first four characters are reserved for the shard id.

Next, we need to specify the configurations for each shard, we can do this normally, but we have to specify the shard id in the configuration.

cfg.SetProperty(ShardedEnvironment.ShardIdProperty, 1);

The shard id is an integer that is used to select the appropriate shard. It is also used to allow you to add new shards without breaking the ids of already existing shards.

Next, you need to implement a shard strategy factory:

image

This allows you to configure the shard strategy based on your needs. This is often where you would add custom behaviors. A shard strategy is composed of several components:

image

The Shard Selection Strategy is used solely to select the appropriate shard for new entities. If you shard your entities based on the user name, this is where you’ll implement that, by providing a shard selection strategy that is aware of this. On of the nice things about NH Shards is that it is aware of the graph as a whole, and if you have an association to a sharded entity, it knows that it needs to place you in the appropriate shard, without giving the burden to you.

For new objects, assuming that you haven’t provided your own shard selection strategy, NHibernate Shards will try to spread them evenly between the shards. The most common implementation is the Round Robin Load Balancer, which will give you a new shard for each new item that you save.

The Shard Resolution Strategy is quite simple, given an entity and the entity id, in which shard should we look for them?

image

image

If you are using a sharded id, such as the one that WeatherReport is using, NH Shards will know which shard to talk to automatically. But if you are using a non sharded id, you have to tell NHibernate how to figure out which shards to look at. By default, if you have non sharded id, it will look at all shards until it finds it.

The shard access strategy specifies how NHibernate Shards talks to the shards when it needs to talk to more than a single shard. NHibernate Shards can do it either sequentially or in parallel. Using parallel access strategy means that NHibernate will hit all your databases at the same time, potentially saving quite a bit of time for you.

The access strategy is also responsible for handling post processing the queries result, merging them and ordering them as needed.

Let us look at the code, okay? As you can see, this is a pretty standard usage of NHibernate.

using(ISession session = sessionFactory.OpenSession())
using(session.BeginTransaction())
{
session.Save(new WeatherReport
{
Continent = "North America",
Latitude = 25,
Longitude = 30,
ReportTime = DateTime.Now,
Temperature = 44
});

session.Save(new WeatherReport
{
Continent = "Africa",
Latitude = 44,
Longitude = 99,
ReportTime = DateTime.Now,
Temperature = 31
});

session.Save(new WeatherReport
{
Continent = "Asia",
Latitude = 13,
Longitude = 12,
ReportTime = DateTime.Now,
Temperature = 104
});
session.Transaction.Commit();
}
Since we are using the defaults, each of those entities is going to go to a different shard. Here is the result:

image
Our data was saved into three different databases. And obviously we could have saved them to three different servers as well.

But saving the data is only part of things, what about querying? Well, let us look at the following query:

session.CreateCriteria(typeof(WeatherReport), "weather").List()

This query will give us:

image

Note that we have three different sessions here, each for its own database, each executing a single query. What is really interesting is that NHibernate will take all of those results and merge them together. It can even handle proper ordering across different databases.

Let us see the code:

var reports =
session.CreateCriteria(typeof(WeatherReport), "weather")
.Add(Restrictions.Gt("Temperature", 33))
.AddOrder(Order.Asc("Continent"))
.List();
foreach (WeatherReport report in reports)
{
Console.WriteLine(report.Continent);
}

Which results in:

image

And in the following printed to the console:

Asia
North America

We got the proper ordering, as we specified in the query, but note that we aren’t handling ordering in the database. Because we are hitting multiple sources, it is actually cheaper to do the ordering in memory, rather than get partially ordered data and they trying to sort it.

Well, that is about it from the point of view of the capabilities.

One of the things that is holding NH Shards back right now is that only core code paths has been implemented. A lot of the connivance methods are currently not implemented. 

image

They are relatively low hanging fruits, and can be implemented without any deep knowledge of NHibernate or NHibernate Shards. Beyond that, the sharded HQL implementation is still not handling order properly, so if you care about ordering you can only query using ICriteria (at the moment).

It isn’t there yet, but it is much closer. You can get a working demo, and probably start working with this and implement things on the fly as you run into things. I strongly urge you to contribute the missing parts, at least the convenience methods, which should be pretty easy.

Please submit patches to our JIRA and discuss the topic at: http://groups.google.com/group/nhcdevs