NHibernate Shards: Progress Report
Since my last post about it, there has been a lot of changes to NHibernate Shards.
Update: I can’t believe I forgot, I was so caught up in how cool this was that I did give the proper credits. Thanks to Dario Quintana and all the other contributors to NHibernate Shards.
The demo actually works :-) You can look at the latest code here: http://nhcontrib.svn.sourceforge.net/svnroot/nhcontrib/trunk/src/NHibernate.Shards/
You can read the documentation for the Java version (most of which is applicable for the .NET version) here: http://docs.jboss.org/hibernate/stable/shards/reference/en/html/
Let us go through how it works, okay?
We have the following class, which we want to shard.
The class mapping is almost standard:
As you can see, the only new thing is the primary key generator. Because entities are sharded based on their primary key, we have to encode the appropriate shard in the shard. The easiest way of doing that is using the SharedUUIDGenerator. This generator generates keys that looks like this:
The first four characters are reserved for the shard id.
Next, we need to specify the configurations for each shard, we can do this normally, but we have to specify the shard id in the configuration.
The shard id is an integer that is used to select the appropriate shard. It is also used to allow you to add new shards without breaking the ids of already existing shards.
Next, you need to implement a shard strategy factory:
This allows you to configure the shard strategy based on your needs. This is often where you would add custom behaviors. A shard strategy is composed of several components:
The Shard Selection Strategy is used solely to select the appropriate shard for new entities. If you shard your entities based on the user name, this is where you’ll implement that, by providing a shard selection strategy that is aware of this. On of the nice things about NH Shards is that it is aware of the graph as a whole, and if you have an association to a sharded entity, it knows that it needs to place you in the appropriate shard, without giving the burden to you.
For new objects, assuming that you haven’t provided your own shard selection strategy, NHibernate Shards will try to spread them evenly between the shards. The most common implementation is the Round Robin Load Balancer, which will give you a new shard for each new item that you save.
The Shard Resolution Strategy is quite simple, given an entity and the entity id, in which shard should we look for them?
If you are using a sharded id, such as the one that WeatherReport is using, NH Shards will know which shard to talk to automatically. But if you are using a non sharded id, you have to tell NHibernate how to figure out which shards to look at. By default, if you have non sharded id, it will look at all shards until it finds it.
The shard access strategy specifies how NHibernate Shards talks to the shards when it needs to talk to more than a single shard. NHibernate Shards can do it either sequentially or in parallel. Using parallel access strategy means that NHibernate will hit all your databases at the same time, potentially saving quite a bit of time for you.
The access strategy is also responsible for handling post processing the queries result, merging them and ordering them as needed.
Let us look at the code, okay? As you can see, this is a pretty standard usage of NHibernate.
using(ISession session = sessionFactory.OpenSession())
Continent = "North America",
Latitude = 25,
Longitude = 30,
ReportTime = DateTime.Now,
Temperature = 44
Continent = "Africa",
Latitude = 44,
Longitude = 99,
ReportTime = DateTime.Now,
Temperature = 31
Continent = "Asia",
Latitude = 13,
Longitude = 12,
ReportTime = DateTime.Now,
Temperature = 104
But saving the data is only part of things, what about querying? Well, let us look at the following query:
This query will give us:
Note that we have three different sessions here, each for its own database, each executing a single query. What is really interesting is that NHibernate will take all of those results and merge them together. It can even handle proper ordering across different databases.
Let us see the code:
var reports =
foreach (WeatherReport report in reports)
Which results in:
And in the following printed to the console:
We got the proper ordering, as we specified in the query, but note that we aren’t handling ordering in the database. Because we are hitting multiple sources, it is actually cheaper to do the ordering in memory, rather than get partially ordered data and they trying to sort it.
Well, that is about it from the point of view of the capabilities.
One of the things that is holding NH Shards back right now is that only core code paths has been implemented. A lot of the connivance methods are currently not implemented.
They are relatively low hanging fruits, and can be implemented without any deep knowledge of NHibernate or NHibernate Shards. Beyond that, the sharded HQL implementation is still not handling order properly, so if you care about ordering you can only query using ICriteria (at the moment).
It isn’t there yet, but it is much closer. You can get a working demo, and probably start working with this and implement things on the fly as you run into things. I strongly urge you to contribute the missing parts, at least the convenience methods, which should be pretty easy.