Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,317 | Comments: 46,923

filter by tags archive

Production Errors Best Practices

time to read 2 min | 268 words

I have been practicing deployment lately, and today we have gotten everything squared away very nicely, only to be confronted with a failure when we actually run the app.

The error was: KeyNotFoundException, because we forgot to move some data to the QA database.  We fixed the error, but I also change the error message that we were getting.

Now, we get the following error for this scenario:

Rhino.ServiceBus.Exceptins.InvalidDeploymentException: ERROR_ORDER_ATTRIB_NOT_FOUND: Could not find order attribute with the name: CorrelationId
This is usually an indication that the database has not been properly prepared for the deployment.
You need to add the order attribute to the appropriate table.
Important! If the order attribute 'CorrelationId' is missing, it is likely that other required order attributes are also missing.
Please refer to the deployment guide about the required attributes. If you don't find the required attributes in the deployment guide, please shout at the developer, preferably at 2 AM.

There are a couple of important things that I want to discuss about this error:

  • It is very explicit about what is going on.
  • It contains the information about how to fix the error.
  • It warns about additional problems that this may relate to.
  • It contains a uniquely identifying string that can be used to clearly communicate the error, placed in Google or put in a knowledge base.

Resolving graphics problems with NH Prof

time to read 1 min | 185 words

Christopher has good coverage on how we solved and isolated a problem that we  run into using NH Prof. In essence, the issue was messed up text being reported by some of the users.

To be frank, it was a very frustrating issue, because none of us could come up with a repro, and the only thing that sometimes seemed to help is updating the graphic drivers. Having to update your graphic drivers to use NH Prof isn’t something that I consider appropriate. It reminded me of the time that to deploy a system that I built, we had to replace the organization domain controller. It is… not a place where you want to be.

We had accidently reproduced it in a SharedView session, and I got to see Christopher and Rob get to the root of the issue in a matter of minutes. It is not an area that I have any sort of expertise on, and it was fascinating to watch.

And, of course, we got rid of the problem that NH Prof had.

Reducing the cost of getting a stack trace

time to read 6 min | 1100 words

image I am trying to find ways to reduce the cost of the stack trace used in NH Prof. The access to the stack trace is extremely valuable, but there is a significant cost of using it, so we need a better way of handling this. I decided to run a couple of experiment running this.

All experiments were run 5,000 times, on a stack trace of 7 levels.

  • new StackTrace(true) - ~600ms
  • new StackTrace(false) - ~150ms

So right there, we have a huge cost saving, but let us continue a bit.

  • throwing exception - ~400ms

That is not so good, I have to say.

Well, when in doubt, cheat!

Using reflector and some _really_ nasty stuff, I came up with this:

var stackFrameHelperType = typeof(object).Assembly.GetType("System.Diagnostics.StackFrameHelper");
var GetStackFramesInternal = Type.GetType("System.Diagnostics.StackTrace, mscorlib").GetMethod("GetStackFramesInternal",BindingFlags.Static|BindingFlags.NonPublic);
var method = new DynamicMethod("GetStackTraceFast",typeof(object),new Type[0],typeof(StackTrace),true);
var generator = method.GetILGenerator();
generator.Emit(OpCodes.Newobj, stackFrameHelperType.GetConstructor(new[] { typeof(bool), typeof(Thread) }));
generator.Emit(OpCodes.Call, GetStackFramesInternal);
getTheStackTrace = (Func<object>)method.CreateDelegate(typeof(Func<object>));

Calling getTheStackTrace 5000 times with depth of 7 frames is… 54ms. And now that is a horse of a different color indeed.

And the best part is, I can use the StackFrameHelper as a key into cached stack traces.

And yes, I am aware that if anyone from the CLR team is reading this, a ninja team will be dispatched to… discuss with me the notion of supported operations vs. unsupported operation.

Pull vs. Push models for UI updates

time to read 3 min | 485 words

I thought that I would give this topic a bit of discussion, because it seems to me that a lot of people are making false assumptions here.

Let us consider a push model for an application UI. Assuming that data binding works, and we have implemented INotifyPropertyChanged and INotifyCollectionChanged, push model is actually the easiest to work with. Change the model, and the UI is updated automatically.

Very often, this can look like magic. And it is a damn simple model to work with.

It also breaks down completely when the number of updates that you have to deal with goes up beyond what is humanly perceivable.

Let us imagine that we have to build a system that needs to provide the user with a real time view of a system that generate 5,000 events per second.

Guess what? There is a serious issue in the actual problem definition. No human can deal with 5,000 events per second. More than that, even if you are displaying an aggregation of the data, it is still not going to work. Having to deal with 5,000 UI updates will usually kill the system. It also means that the system is doing way too much work. Just processing those UI events is going to take a lot of time. Time that we could dedicate to actually making the UI work for the user.

A better model in this scenario is to split the work. We have a backend that deals with the data aggregation, and we have the UI poll the backend fora snapshot of the information periodically (every .2 - .5 seconds).

The advantage is that now we have far less UI work to do. More than that, we don't have to go through all the work to show the current state.

Let us assume that we have 5,000 events per second, that we update the UI every .5 seconds and that only the last 500 events are actually relevant for the UI.

Using the push model, we would have to actually process all 5,000 changes. Using the pull model, every time we pull the new snapshot, we get the already processed data. So in this case, we throw away all the data that is not relevant to us (everything not in the last 500 events that showed up since the last batch. Trying to implement the same for a push model require a fairly complex system, especially if you have a rich model, and you don't have a good way of telling what the implications of change would be.

The reason that this matters is that UI operations are expensive, while backend operations tend to be very cheap.

Add that to the human perception time lag, and you get a smoother user experience. And since now the UI work that needs to be done is so much less, and since the polling happens when the UI isn't busy doing other things, we also have a more responsive UI.

Typemock Open AOP

time to read 1 min | 90 words

I just stumbled upon Typemock Open AOP, I know about as little as it is possible to know about this, but I am going to find out more, fast.

It looks like the idea solution for a number of problems that I am facing at the moment. I wanted the power that TypeMock has for a very long time, and it certainly looks like the ideal AOP story. From talking with Eli in 2007, the TypeMock approach impose an overhead of about 5%, which is quite reasonable, in most scenarios.

On Hadoop

time to read 2 min | 207 words

Yesterday or the day before that I read the available chapters for Hadoop in Action. Hadoop is a Map Reduce implementation in Java, and it includes some very interesting ideas.

The concept of Map Reduce isn't new, but I liked seeing the actual code examples, which made it so much easier to follow what is actually going on. As usual, an In Action book has a lot of stuff in it that relates to getting things started, and since I don't usually work in Java, they were of little interest to me. But the core ideas are very interesting.

It does seems to be limited to a very small set of scenarios, needing to, in essence, index large sets of data. Some of the examples in the book made sense as theoretical problems, but I think that I am still missing the concrete "order to cash" scenario, seeing how we take a business problem and turn that into a set of technical challenges that can be resolved by utilizing Map Reduce in some part of the problem.

As I said, only the first 4 chapters are currently available, and I was reading the early access version, so it is likely will be fixed when more chapters are coming in.

Core NHibernatePersistence with NHibernate

time to read 1 min | 96 words

I can't believe that I didn't mention this before. I am partnering with Skills Matter to create a course about using NHibernate.

You can see the full details here.

The first run is in London in May, followed by Norway in June and London again in August. After that I stopped tracking, but it is very frequent.

This is a three days course that is meant for developers who want to get into NHibernate. I am going to assume zero NHibernate knowledge and take you to the spot where you can build NHibernate applications on your own.

Rhino Persistent Hash Table

time to read 3 min | 472 words

I spoke about the refactoring for Rhino DHT in this post. Now, I want to talking about what I actually did.

Before, Rhino DHT was independent, and had no dependencies outside a Windows machine. But because of the need to support replication, I decided to make some changes to the way it is structured. Furthermore, it became very clear that I am going to want to use the notion of a persisted hash table in more than just the DHT.

The notion of a persisted hash table is very powerful, and one that I already gave up on when I found about Esent. So I decided that it would make sense to make an explicit separation between the notion of the persistent hash table and the notion of distributed hash table.

So I moved all the code relating to Esent into Rhino.PersistentHashTable, while keeping the same semantics as before (allowing conflicts by design, and letting the client resolve them, pervasive read caching) while adding some additional features, like multi value keys (bags of values), which you are quite useful for a number of things.

The end result is a very small interface, which we can use to persist data with as little fuss as you can imagine:


This may seem odd to you at first, why do we need two classes for this? Here is a typical use case for this:

using (var table = new PersistentHashTable(testDatabase))

	table.Batch(actions =>
		actions.Put(new PutRequest
			Key = "test",
			ParentVersions = new ValueVersion[0],
			Bytes = new byte[] { 1 }

var values = actions.Get(new GetRequest { Key = "test" });
		Assert.Equal(1, values[0].Version.Number);
		Assert.Equal(new byte[] { 1 }, values[0].Data);

There are a few things to note in this code.

  1. We use Batch as a transaction boundary, and we call commit to well, commit the transaction. That allow us to batch several actions into a single database operation.
  2. Most of the methods access a parameter object. I found that this makes versioning the API much easier.
  3. We have only two set of API that we actually care about:
    1. Get/Put/Remove - for setting a single item in the PHT.
    2. AddItem/RemoveItem/GetItems - for adding, removing or querying lists. Unlike single value items, there is no explicit concurrency control over lists. That is because with lists, there is no concept of concurrency control, since each of the operation can be easily made safe for concurrent use. In fact, that is why the PHT provides an explicit API for this, instead of providing it on top of the single value API.

I am planning on making Rhino PHT the storage engine for persistent data in NH Prof.

NH ProfPerformance implications for the profiled application

time to read 5 min | 928 words

I got a couple of complaints about this, and I think that this is a really important issue. Using NH Prof slow down the profiled application. That is to be expected, since we require the application to do more work, but it is not as good as we can hope it would be.

Here is the test scenario, running against the Northwind database (thanks Davy for supplying it):

class Program
	static void Main(string[] args)
		var sp = Stopwatch.StartNew();

		ISessionFactory sessionFactory = CreateSessionFactory();

		using (var session = sessionFactory.OpenSession())
			var products = session.CreateCriteria(typeof(Product))

		using (var session = sessionFactory.OpenSession())
			var products = session.CreateCriteria(typeof(Product))

		using (var session = sessionFactory.OpenSession())
			var products = session.CreateCriteria(typeof(Product))

		using (var session = sessionFactory.OpenSession())
			var products = session.CreateCriteria(typeof(Product))

		using (var session = sessionFactory.OpenSession())
			var products = session.CreateCriteria(typeof(Product))

		using (var session = sessionFactory.OpenSession())
			var products = session.CreateCriteria(typeof(Product))


	private static ISessionFactory CreateSessionFactory()
		Configuration configuration = new Configuration()

		return configuration.BuildSessionFactory();

With NH Prof active, this takes: 13 - 16 seconds to run.

Without NH Prof running, this takes: 12 seconds to run.

The reason for that difference is probably that NH Prof is doing some work when new events come in, so it steals some CPU time from the other process.

Disabling NH Prof (commenting the NHibernateProfiler.Initialize()), this takes 4 - 6 seconds to run. This is a huge difference.

I decided to see whatever this is something that I can actually do something about, and turned the NHibernateAppender into a no op. So the next run would measure just the time that it takes to NHibernate to generate the event that I care about, and not whatever processing I am doing on the profiled application side.

The result both surprised and thrilled me: 4.8 seconds!

Why was I happy about this?

Because it meant that I was actually at fault here. And if I was actually at fault, it means that I can fix things. I pulled up the trusty dotTrace Profiler, and quickly identified a hot spot:


Well, in this case, you need to understand a bit about how NH Prof get the data from the profiled application. We gather the event stream and batch send it to the actual profiler. Those sends are RPC calls, and it is expected that they would be slow. Except, they are not happening in the same thread as the actual application, so they should have little effect on the actual performance of the test scenario that we have above.

I decided to try it out again, this time, we will change FlushMEssagesToProfiler and SendStatsticInformation to no ops, and see what happens.

Running the application again, we got runtime of 13 seconds. That confirmed my original suspicion, that while those actions takes a long time, they do not contribute to the actual perf problem in the actual code.

I run the code again under the profiler, looking deeper. That led me to this discovery:


Well, I knew that stack trace was slow, but I sincerely thought that I was doing better job in reducing the number of times that I had to call it. Calling it 4,178 times is obviously wrong.

We track the stack trace for the following event sources:

  • NHibernate.SQL
  • NHibernate.Event.Default.DefaultLoadEventListener
  • NHibernate.Cache.StandardQueryCache
  • NHibernate.Transaction.AdoTransaction

The problem is that some things, like DefaultLoadEventListener, are generating a lot of events that I don't really care about. I started approaching the problem by applying filters, so I would only ever create the stack trace for the events I truly care about. That brought the cost down significantly, so now it took 6 - 7 seconds. We shaved 5 - 10 seconds of the total running time, awesome.

But I think that we can still do better than that. Let us see what dotTrace is saying:


dotTrace is saying that we now reduced the time this takes from 31% to 3.5%. We have better things to do than to look at this piece of code again, then.

Let us focus on something that will give us more benefit, then:


This is actually the initial RPC call between the appender and the profiler. It establish the TCP connection, among other things, so it takes a while to process. Almost 0.7 seconds is a lot of time. I changed that code to run on a background thread, further improving the startup time of the application.

Time to execute the code now? 5.8 seconds.

Time to go to dotTrace again, I think.


And we can see that what dominate the actual time for the test is the startup concerns of NHibernate. But I think that in a real scenario, the time for generating a stack trace would again become a meaningful for performance.

There are probably additional things that I can do to optimize this, but I think that this is a  great step. To remind you, the initial costs were:

  • No profiler: 4 - 6 seconds
  • With profiler: 13 - 16 seconds

After the changes, we are with: 5 - 7 seconds, which I consider entirely reasonable.

And for a few hours of performance tuning, reducing the time by 8-9 seconds and by over 60%, I think I did good.

The code pass all the tests and was committed, so in about ten minutes, you would be able to download it from the NH Prof site.


  1. RavenDB Conference videos: Implementing CQRS and Event Sourcing with RavenDB - 9 hours from now
  2. How did the milk get to the fridge? - about one day from now
  3. RavenDB Conference videos: Building Codealike: a journey into the developers analytics world - 4 days from now
  4. Low level Voron optimizations: Transaction lock handoff - 5 days from now
  5. RavenDB Conference Videos: Delving into Documents with Data Subscriptions - 6 days from now

And 7 more posts are pending...

There are posts all the way to Mar 10, 2017


  1. RavenDB Conference videos (12):
    21 Feb 2017 - Zapping ever faster
  2. Low level Voron optimizations (5):
    20 Feb 2017 - Recyclers do it over and over again.
  3. Implementing low level trie (4):
    26 Jan 2017 - Digging into the C++ impl
  4. Answer (9):
    20 Jan 2017 - What does this code do?
  5. Challenge (48):
    19 Jan 2017 - What does this code do?
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats