Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:


+972 52-548-6969

, @ Q j

Posts: 6,647 | Comments: 48,405

filter by tags archive

NHibernate Shards: Progress Report

time to read 9 min | 1774 words

Since my last post about it, there has been a lot of changes to NHibernate Shards.

Update: I can’t believe I forgot, I was so caught up in how cool this was that I did give the proper credits. Thanks to Dario Quintana and all the other contributors to NHibernate Shards.

The demo actually works :-) You can look at the latest code here: http://nhcontrib.svn.sourceforge.net/svnroot/nhcontrib/trunk/src/NHibernate.Shards/

You can read the documentation for the Java version (most of which is applicable for the .NET version) here: http://docs.jboss.org/hibernate/stable/shards/reference/en/html/

Let us go through how it works, okay?

We have the following class, which we want to shard.


The class mapping is almost standard:


As you can see, the only new thing is the primary key generator. Because entities are sharded based  on their primary key, we have to encode the appropriate shard in the shard. The easiest way of doing that is using the SharedUUIDGenerator. This generator generates keys that looks like this:

  • 00010000602647468c2ef2f10ded039a
  • 000200006ba74626a564d147dc89f9ad
  • 00030000eb934532b828601979036e3c

The first four characters are reserved for the shard id.

Next, we need to specify the configurations for each shard, we can do this normally, but we have to specify the shard id in the configuration.

cfg.SetProperty(ShardedEnvironment.ShardIdProperty, 1);

The shard id is an integer that is used to select the appropriate shard. It is also used to allow you to add new shards without breaking the ids of already existing shards.

Next, you need to implement a shard strategy factory:


This allows you to configure the shard strategy based on your needs. This is often where you would add custom behaviors. A shard strategy is composed of several components:


The Shard Selection Strategy is used solely to select the appropriate shard for new entities. If you shard your entities based on the user name, this is where you’ll implement that, by providing a shard selection strategy that is aware of this. On of the nice things about NH Shards is that it is aware of the graph as a whole, and if you have an association to a sharded entity, it knows that it needs to place you in the appropriate shard, without giving the burden to you.

For new objects, assuming that you haven’t provided your own shard selection strategy, NHibernate Shards will try to spread them evenly between the shards. The most common implementation is the Round Robin Load Balancer, which will give you a new shard for each new item that you save.

The Shard Resolution Strategy is quite simple, given an entity and the entity id, in which shard should we look for them?



If you are using a sharded id, such as the one that WeatherReport is using, NH Shards will know which shard to talk to automatically. But if you are using a non sharded id, you have to tell NHibernate how to figure out which shards to look at. By default, if you have non sharded id, it will look at all shards until it finds it.

The shard access strategy specifies how NHibernate Shards talks to the shards when it needs to talk to more than a single shard. NHibernate Shards can do it either sequentially or in parallel. Using parallel access strategy means that NHibernate will hit all your databases at the same time, potentially saving quite a bit of time for you.

The access strategy is also responsible for handling post processing the queries result, merging them and ordering them as needed.

Let us look at the code, okay? As you can see, this is a pretty standard usage of NHibernate.

using(ISession session = sessionFactory.OpenSession())
session.Save(new WeatherReport
Continent = "North America",
Latitude = 25,
Longitude = 30,
ReportTime = DateTime.Now,
Temperature = 44

session.Save(new WeatherReport
Continent = "Africa",
Latitude = 44,
Longitude = 99,
ReportTime = DateTime.Now,
Temperature = 31

session.Save(new WeatherReport
Continent = "Asia",
Latitude = 13,
Longitude = 12,
ReportTime = DateTime.Now,
Temperature = 104
Since we are using the defaults, each of those entities is going to go to a different shard. Here is the result:

Our data was saved into three different databases. And obviously we could have saved them to three different servers as well.

But saving the data is only part of things, what about querying? Well, let us look at the following query:

session.CreateCriteria(typeof(WeatherReport), "weather").List()

This query will give us:


Note that we have three different sessions here, each for its own database, each executing a single query. What is really interesting is that NHibernate will take all of those results and merge them together. It can even handle proper ordering across different databases.

Let us see the code:

var reports =
session.CreateCriteria(typeof(WeatherReport), "weather")
.Add(Restrictions.Gt("Temperature", 33))
foreach (WeatherReport report in reports)

Which results in:


And in the following printed to the console:

North America

We got the proper ordering, as we specified in the query, but note that we aren’t handling ordering in the database. Because we are hitting multiple sources, it is actually cheaper to do the ordering in memory, rather than get partially ordered data and they trying to sort it.

Well, that is about it from the point of view of the capabilities.

One of the things that is holding NH Shards back right now is that only core code paths has been implemented. A lot of the connivance methods are currently not implemented. 


They are relatively low hanging fruits, and can be implemented without any deep knowledge of NHibernate or NHibernate Shards. Beyond that, the sharded HQL implementation is still not handling order properly, so if you care about ordering you can only query using ICriteria (at the moment).

It isn’t there yet, but it is much closer. You can get a working demo, and probably start working with this and implement things on the fly as you run into things. I strongly urge you to contribute the missing parts, at least the convenience methods, which should be pretty easy.

Please submit patches to our JIRA and discuss the topic at: http://groups.google.com/group/nhcdevs

What am I missing? MSMQ Perf Issue

time to read 4 min | 723 words

I am getting some strange perf numbers from MSMQ, and I can’t quite figure out what is going on here.

The scenario is simple, I have a process reading from queue 1 and writing to queue 2. But performance isn’t anywhere near where I think it should be.

In my test scenario, I have queue 1 filled with 10,000 messages, each about 1.5 Kb in size. My test code does a no op move between the queues. Both queues are transactional.

Here is the code:

private static void CopyData()
var q1 = new MessageQueue(@".\private$\test_queue1");
var q2 = new MessageQueue(@".\private$\test_queue2");
var sp = Stopwatch.StartNew();
while (true)
using (var msmqTx = new MessageQueueTransaction())

Message message;
message = q1.Receive(TimeSpan.FromMilliseconds(0), msmqTx);
catch (MessageQueueException e)

q2.Send(message, msmqTx);

Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds);

Using this code, it takes 236.8 seconds to move 10,000 messages. If I use System.Transactions, instead of MSMQ’s internal transactions, I get comparable speeds.

Just to give you an idea, this is about 40 messages a second, this number is ridiculously low.

Changing the code so each operation is a separate transaction, like this:

private static void CopyData()
var q1 = new MessageQueue(@".\private$\test_queue1");
var q2 = new MessageQueue(@".\private$\test_queue2");
var sp = Stopwatch.StartNew();
while (true)

Message message;
message = q1.Receive(TimeSpan.FromMilliseconds(0), MessageQueueTransactionType.Single);
catch (MessageQueueException e)

q2.Send(message, MessageQueueTransactionType.Single);
Console.WriteLine("{0:#,#}", sp.ElapsedMilliseconds);

Means that it takes 16.3 seconds, or about 600 messages per second, which is far closer to what I would expect.

This is on a quad core machine 8 GB RAM (4 GB free), so I don’t think that it is the machine that is causing the problem. I can see similar results on other machines as well.

Am I missing something? Is there something in my code that is wrong?

ChallengeWrite the check in comment

time to read 2 min | 291 words

In my code base, I have the following commit, which doesn’t have a checkin comment. This is the before:

private Entry GetEntryForSlug(string slug)
var entry = session.CreateCriteria(typeof(Entry))
.Add(Restrictions.Like("Slug", slug))

if(entry != null)
return entry;

//try load default document
return session.CreateCriteria(typeof (Entry))
.Add(Restrictions.Like("Slug", slug + "/default"))

And this is the after:

private Entry GetEntryForSlug(string slug)
var entries = session.CreateCriteria(typeof(Entry))
Restrictions.Like("Slug", slug) ||
Restrictions.Like("Slug", slug + "/default")

return entries
.OrderBy(x => x.Slug.Length)

What was the silent committer thinking about when he made the change? Can you write the check in comment?

Life altering decisions

time to read 2 min | 202 words

I recently had the chance to reminisce, and I run into this post, this is me in 2004, talking about my logo:

ayende's logo

People frequently ask my, “Why Rhinos?”, here is the answer, from way back ago:

I thought about using some sort of a symbol to create the logo, I toyed for a while with the Windings font, but found that lacking for my purposes, then I thought about using an animal's shadow as the watermark. I thought about using some sort of dragon or a wolf, but those are really banal. I spent some time thinking about it until I finally decided to use a Rhino, I just thought about it and I think that it's cool. Beside, I don't know anyone else that uses Rhinos.

The logo and the general orange theme of the blog, were decisions that I just made, there was no deep thinking involved. My living room has a lot of orange in it right now, and I have a Rhino tattoo.

It is funny how such small a decision can have such a significant impact.

Can you imagine Wolf.Mocks?

Git commits as code review?

time to read 1 min | 107 words

I just had to go through a code base where I had a bunch of of comments.

Instead of going with the usual route of just noting the changes that I think should be done, I decided to do something else. I fixed each individual change, and commit them individually.

This is how it looks like, each commit is usually less than a single screen of changes (diff mode).


I wonder if it is something that I can apply more generally.

UI Mockups

time to read 4 min | 680 words

I spoke in the past about the importance of UI mockups. I consider them an essential part of software design. It is often much easier to gather requirements when you are talking in concrete terms about how the software looks like.

It is important to note that I do not care about the look and feel of the mockup, in fact, I don’t want it to look good, it should have a drafty look that would make it clear that this is just a mockup. It is a communication tool as much as anything else. It just allow me to talk about things in concrete terms.

A while ago I found Balsamiq, and I was happy. Balsamiq does just about everything that I want from a UI mockup tool.

Except, it doesn’t do Hebrew :-(.

That prompt me to start looking at several other UI mockup tools, to see if there are any that do support Hebrew and that I can tolerate them.

I tried a few others, and got varying support for Hebrew. The best ones from Hebrew support perspective where Mockup Screens and Blend’s SketchFlow.

My test scenario was this UI (in Hebrew):


This is Balsamiq mock, which took about 3 minutes to build, the first time I used Balsamiq.

I could get it in Mockup Screens after about ten minutes of playing with the options to get it working. Mockup Screens is nice, but it is giving me too many knobs to turn. I felt it especially when I tried to do the table. I had to specify columns & rows, and specifying the data was done in a separate dialog.

It works, but it is awkward.

The next one I tried is Blend’s SketchFlow. I heard a lot of excitement about it, so I tried to give it a spin. A Silverlight 3 SketchFlow project simply does not support Hebrew. A WPF SketchFlow does, and that exposed me to a very nasty surprise. I was expected a way to easily build sketches of the screens that I was interested in.

Basically, something similar to what Balsamiq gives me. What it turns out is that there are a few simple controls (check box, drop down, etc) and everything else that you want you need to actually sketch. That is, sit down and draw it.

The problem is that this take inordinate amount of time. Especially if you are actually trying to do something like the screen above. I just wan’t a stupid table with some data, but this was way too hard to do in SketchFlow. If I am reduced to literally drawing on the screen, I might as well draw it on paper. It is about as easy to manipulate, at that stage.

In the end, I think that what I’ll end up doing is using Excel or Word as the mockup tool. They both have good Hebrew support, and they can do things like tables, drop downs and checkbox easily, which is pretty much all I need. It also make it perfectly clear that this isn’t something executable.

And if you don’t care for Hebrew support, go and use Balsamiq, it is awesome.


Matt Kellogg has informed me that you can use Balsamiq with Hebrew support, all you need to do is set Use System Fonts:


And once that is done, it took only a few moments to handle this:


Now, it does suffer from all the usual RTL problems, but it is working, and it is still by far the best one out there.

What is up with the Entity Framework vNext?

time to read 2 min | 297 words

Every now and then I do a quick check on the EF blog, just to see what there status is. My latest peek had caused me to gulp. I am not sure where the EF is going with things, I just know that I don’t really like it.

For a start, take a look at the follow sample from their Code Only mapping (basically Fluent NHibernate):

e => new {
manager = e.Manager.Id
thisIsADiscriminator = “E”

There are several things wrong here: “manager” and “thisIsADiscriminator” are strings for all intent and purposes. The compiler isn’t going to check them, they aren’t there to do something, they are just there to avoid being a literal string. But they are strings.

Worse, “thisIsADiscriminator” is a magic string.

Second, and far more troubling, I am looking at this class definition and I cringe:

public class Category{
public int ID {get;set;}
public string Name {get;set;}
public List<Product> Products {get;set;}

The problem is quite simple, this class has no hooks for lazy loading. You have to load everything here in one go. Worse, you probably need to load the entire graph in one go. That is scaring me on many levels.

I am not sure how the EF is going to handle it, but short of IL Rewrite techniques, which I don’t think the EF is currently using, this is a performance nightmare waiting to happen.

Disposable servers

time to read 1 min | 93 words

I have been using Amazon EC2 and GoGrid recently, and I found myself spinning out a server to do a specific task (converting Subversion repository to git repository, copying a database, trying out VS 2010, etc).

Usually I do it because the cloud machines has a lot more bandwidth than I have locally, and I can also choose a specific OS + services that match what I care for. I am not used to thinking about servers in this fashion, but I find myself doing it fairy often recently.

Interesting shift.

ReviewGoGrid vs.Amazon EC2

time to read 5 min | 935 words

For the last year or so, I have been running my servers on GoGrid. The main reason for wanting to do that is because I wanted to run on a Windows 2008 server, and there was no good UI for managing EC2 at the time. I have been playing around with EC2, trying to see if this fit my needs better. Well, in reality, I was trying to see if locating a server instance in Europe would give me better latency when I need to administer the servers (it does).

Nitpicker corner: The following is a review of both providers for my scenario. I know that you’ll ignore that, but do try paying attention to the fact that I am talking about my scenario, and not yours.

imageAfter playing around with EC2 for several days, I came with the following conclusions. GoGrid & EC2 only looks similar. They are actually two different offering targeting different types of scenarios.

EC2 is to handle clouds, period. Server instances are meaningless, except maybe by their role. They come & go as they please and they are utterly disposable. That makes it great if what you want to do is have a cloud. It is very hard if what you want is just running server instances.

For my purposes, I don’t want to have a cloud of indistinguishable machines, I want to be able to run a lot of different sites on different machines. Maybe I want to have a set of machines for a single site, but I still want to be able to clearly and easily separate the machines that run nhprof.com and the ones that run ayende.com.

That made working with EC2 really uncomfortable, to tell you the truth. They don’t support anything like naming an instance, which make sense, from their point of view. You should not get attached to an instance, you should get attached to an image of that instance.

That doesn’t work quite that well for my case, however. I was willing to accept this limitation, but I run into a few others that were deal breakers for me.

  • Amazon EC2 doesn’t support Windows 2008. This is really annoying, both because there are some features that I could have used and because Windows 2003 (their only Windows offering) is not supporting RDP saved passwords, and doesn’t support copy/paste in the login screen. Both of which make it horrible login experience with the crazy passwords Amazon assign.
  • No good way to recover an instance. I setup an instance, told it to install security updates, and rebooted it. It didn’t come up. I am not sure what I am supposed to do in this case. You have to enter a support contract to recover this, as far as I could tell (and I wasn’t willing to do so just for the trial). It seems that the response would have been: ‘just recreate the instance’.
  • Long instance provisioning time. Spinning up a new instance in EC2 seems to take about 15 – 30 minutes, which is annoyingly long. Yes, I know about reserved instances, not applicable for my scenario. (GoGrid feels faster, but I don’t have any data to say if it is faster).

GoGrid, on the other hand, takes a drastically different approach. image

Basically, they give you a server farm, you can create instances of machines, name them and manage them as individuals. They do support Windows 2008, and their support is phenomenal.

I have been running nhprof.com there for a long while, and overall, I love what they are doing, their support level and the experience.

Their main disadvantage compared to EC2 was that they didn’t support cloning an image, they fixed that a while ago, which makes me very happy.

This subtle difference, from focusing on set of instances to focusing on an instance, is a huge benefit for my scenario, because it allows me to manage the different parts of my infrastructure directly, instead of indirectly or via brute force.

One thing to note about both of them. They both consider instances to be disposable. If you have problems with a server, you are probably better off just creating a new instance and setting that up.

GoGrid is better than EC2 here, because they will try to salvage a dead instance, but you should have a backup and the means to restore it on a new machine ready, in the end, it is much easier than the alternative.

I am using a backup service (Mozy, if you care) for all my servers, and that takes care of that.

Note that my scenario is that I care about running my existing applications on a server in the cloud, not running in the cloud. That is why I didn’t even consider something like Azure of AppEngine. They don’t matter for this scenario.

If I was building a new application that required scaling, it would probably be a different sort of decision matrix, with potentially a different result.

From pricing perspective, they seems comparable. GoGrid says that they are cheaper, but they would :-)


  1. I want to see the QA process that catch this bug! - 6 hours from now

There are posts all the way to Jun 21, 2018


  1. RavenDB 4.1 features (6):
    20 Jun 2018 - Cluster wide ACID transactions
  2. Codex KV (2):
    06 Jun 2018 - Properly generating the file
  3. I WILL have order (3):
    30 May 2018 - How Bleve sorts query results
  4. Inside RavenDB 4.0 (10):
    22 May 2018 - Book update
  5. RavenDB Security Report (5):
    06 Apr 2018 - Collision in Certificate Serial Numbers
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats