Ayende @ Rahien

It's a girl

NHibernate Perf Tricks

I originally titled this post NHibernate Stupid Perf Tricks, but decided to remove that. The purpose of this post is to show some performance optimizations that you can take advantage of with NHibernate. This is not a benchmark, the results aren’t useful for anything except comparing to one another. I would also like to remind you that NHibernate isn’t intended for ETL scenarios, if you desire that, you probably want to look into ETL tools, rather than an OR/M developed for OLTP scenarios.

There is a wide scope for performance improvements outside what is shown here, for example, the database was not optimized, the machine was used throughout the benchmark, etc.

To start with, here is the context in which we are working. This will be used to execute the different scenarios that we will execute.

The initial system configuration was:

<hibernate-configuration xmlns="urn:nhibernate-configuration-2.2">
  <session-factory>
    <property name="dialect">NHibernate.Dialect.MsSql2000Dialect</property>
    <property name="connection.provider">NHibernate.Connection.DriverConnectionProvider</property>
    <property name="connection.connection_string">
      Server=(local);initial catalog=shalom_kita_alef;Integrated Security=SSPI
    </property>
    <property name='proxyfactory.factory_class'>
	NHibernate.ByteCode.Castle.ProxyFactoryFactory, NHibernate.ByteCode.Castle
     </property>
    <mapping assembly="PerfTricksForContrivedScenarios" />
  </session-factory>
</hibernate-configuration>

The model used was:

image 

And the mapping for this is:

<class name="User"
			 table="Users">
	<id name="Id">
		<generator class="hilo"/>
	</id>

	<property name="Password"/>
	<property name="Username"/>
	<property name="Email"/>
	<property name="CreatedAt"/>
	<property name="Bio"/>

</class>

And each new user is created using:

public static User GenerateUser(int salt)
{
	return new User
	{
		Bio = new string('*', 128),
		CreatedAt = DateTime.Now,
		Email = salt + "@example.org",
		Password = Guid.NewGuid().ToByteArray(),
		Username = "User " + salt
	};
}
Our first attempt is to simply check serial execution speed, and I wrote the following (very trivial) code to do so.
const int count = 500 * 1000;
var configuration = new Configuration()
	.Configure("hibernate.cfg.xml");
new SchemaExport(configuration).Create(false, true);
var sessionFactory = configuration
	.BuildSessionFactory();

var stopwatch = Stopwatch.StartNew();

for (int i = 0; i < count; i++)
{
	using(var session = sessionFactory.OpenSession())
	using(var tx = session.BeginTransaction())
	{
		session.Save(GenerateUser(i));
		tx.Commit();
	}

}

Console.WriteLine(stopwatch.ElapsedMilliseconds);

Note that we create a separate session for each element. This is probably the slowest way of doing things, since it means that we significantly increase the number of connections open/close and transactions that we need to handle.

This is here to give us a base line on how slow we can make things, to tell you the truth. Another thing to note that this is simply serial. This is just another example of how this is not a true representation of how things happen in the real world. In real world scenarios, we are usually handling small requests, like the one simulated above, but we do so in parallel. We are also using a local database vs. the far more common remote DB approach which skew results ever furhter.

Anyway, the initial approach took: 21.1 minutes, or roughly a row every two and a half milliseconds, about 400 rows / second.

I am pretty sure most of that time went into connection & transaction management, though.

So the first thing to try was to see what would happen if I would do that using a single session, that would remove the issue of opening and closing the connection and creating lots of new transactions.

The code in question is:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	for (int i = 0; i < count; i++)
	{
		session.Save(GenerateUser(i));
	}

	tx.Commit();
}

I expect that this will be much faster, but I have to explain something. It is usually not recommended to use the session for doing bulk operations, but this is a special case. We are only saving new instances, so the flush does no unnecessary work and we only commit once, so the save to the DB is done in a single continuous stream.

This version run for 4.2 minutes, or roughly 2 rows per millisecond about 2,000 rows / second.

Now, the next obvious step is to move to stateless session, which is intended for bulk scenarios. How much would this take?

using (var session = sessionFactory.OpenStatelessSession())
using (var tx = session.BeginTransaction())
{
	for (int i = 0; i < count; i++)
	{
		session.Insert(GenerateUser(i));
	}
	tx.Commit();
}

As you can see, the code is virtual identical. And I expect the performance to be slightly improved but on par with the previous version.

This version run at 2.9 minutes, about 3 rows per millisecond and close to 2,800 rows / second.

I am actually surprised, I expected it to be faster, but it was much faster.

There are still performance optimizations that we can make, though. NHibernate has a rich batching system that we can enable in the configuration:

<property name='adonet.batch_size'>100</property>

With this change, the same code (using stateless sessions) runs at: 2.5 minutes and at 3,200 rows / second.

This doesn’t show as much improvement as I hoped it would. This is an example of how a real world optimization is actually failing to show its promise in a contrived example. The purpose of batching is to create as few remote calls as possible, which dramatically improve performance. Since we are running on a local database, it isn’t as noticeable.

Just to give you some idea about the scope of what we did, we wrote 500,000 rows and 160MB of data in a few minutes.

Now, remember, those aren’t numbers you can take to the bank, their only usefulness is to know that by a few very simple acts we improved performance in a really contrived scenario by 90% or so. And yes, there are other tricks that you can utilize (preparing commands, increasing the batch size, parallelism, etc). I am not going to try to outline then, though. For the simple reason that performance should be quite enough for everything who is using an OR/M. That bring me back to me initial point, OR/M are not about bulk data manipulations, if you want to do that, there are better methods.

For the scenario outlined here, you probably want to make use of SqlBulkCopy, or the equivalent for doing this. Just to give you an idea about why, here is the code:

var dt = new DataTable("Users");
dt.Columns.Add(new DataColumn("Id", typeof(int)));
dt.Columns.Add(new DataColumn("Password", typeof(byte[])));
dt.Columns.Add(new DataColumn("Username"));
dt.Columns.Add(new DataColumn("Email"));
dt.Columns.Add(new DataColumn("CreatedAt", typeof(DateTime)));
dt.Columns.Add(new DataColumn("Bio"));

for (int i = 0; i < count; i++)
{
	var row = dt.NewRow();
	row["Id"] = i;
	row["Password"] = Guid.NewGuid().ToByteArray();
	row["Username"] ="User " + i;
	row["Email"] = i + "@example.org";
	row["CreatedAt"] =DateTime.Now;
	row["Bio"] =  new string('*', 128);
	dt.Rows.Add(row);
}

using (var connection = ((ISessionFactoryImplementor)sessionFactory).ConnectionProvider.GetConnection())
{
	var s = (SqlConnection)connection;
	var copy = new SqlBulkCopy(s);
	copy.BulkCopyTimeout = 10000;
	copy.DestinationTableName = "Users";
	foreach (DataColumn column in dt.Columns)
	{
		copy.ColumnMappings.Add(column.ColumnName, column.ColumnName);
	}
	copy.WriteToServer(dt);
}

And this ends up in 49 seconds, or about 10,000 rows / second.

Use the appropriate tool for the task.

But even so, getting to 1/3 of the speed of SqlBulkCopy (the absolute top speed you can get to) is nothing to sneeze at.

Comments

Mikael Henriksson
08/22/2009 09:46 AM by
Mikael Henriksson

Spot on! Use the right tool for the task!!!! Haha this post is sort of the discussion ender. There is nothing more to say on the subject. You can't argue with this. :)

Alex Yakunin
08/22/2009 11:58 AM by
Alex Yakunin

Unfortunately, I can:

  • So by this post, Oren, you has confirmed our tests for NH are near-optimal. We use almost identical code.

  • We shows our performance is 2 times higher, or just 1.5 times slower than SqlBulkCopy. And, as I've mentioned, today I'll explain how to get even higher performance (I expect we'll get ~ at least 15-20% more) in my blog ( http://blog.dataobjects.net ).

  • I think being even 1.5 times slower than SqlBulkCopy is more than good acceptable for complete storage independence.

  • And finally, I explained many many times why we don't test SqlBulkCopy: ormbattle.net/.../...stead-of-common-in-tests.html . Think if this is related to ORM at all.

Finally, I think you must say something exact about a kind of bet I proposed to you here: ormbattle.net/.../...i-dont-believe-oren-eini.html - I won't simply forget this.

Alex Yakunin
08/22/2009 11:59 AM by
Alex Yakunin

"good acceptable" => "acceptable".

P.S. Thanks a lot for spending your time to finally investigate this!

Mark Nijhof
08/22/2009 12:37 PM by
Mark Nijhof

Since when is bashing the competition a valid sales technique? The only thing is that you lose respect from potential customers. If this new tool is indeed so much better (in all aspects, because I only hear performance arguments, which is absolutely not the most important thing for an ORM) then the public will decide that for itself.

I also don't understand why Ayende is spending so much time on these silly things, why not ignore it than a lot less people would even know about it, and it is not like you can ever change these persons mind.

Mark Nijhof
08/22/2009 01:21 PM by
Mark Nijhof

@Alex how does that link explain why basing competition is good and shows me that you are talking about anything else than performance?

Ayende Rahien
08/22/2009 01:52 PM by
Ayende Rahien

Alex Yakunin,

I removed the links comment.

As an aside, please be aware that all the links in the comments include rel="nofollow", so they don't result in any google joice.

Mark Nijhof
08/22/2009 01:54 PM by
Mark Nijhof

@Alex hmm let me try to rephrase my questions: how do these links explain why basing competition is good and shows me that you are talking about anything else than performance?

Ayende Rahien
08/22/2009 01:54 PM by
Ayende Rahien

Alex Yakunin,

You also seemed to missed the point of this post. Have you seen the picture?

Niklas
08/22/2009 02:23 PM by
Niklas

I get a "Reported Attack Site!" warning when navigating ormbattle.net. Apparently Google was offered a trojan there a few days ago. Gotta love Russia.

Alex Yakunin
08/22/2009 04:48 PM by
Alex Yakunin

Yes... It was infected by a virus right after launch - our developers had forgot to tune up the security properly. We resolved the issue almost immediately, but Google still remembers this, although the site is safe now.

I'd appreciate any ideas on how to fix this.

Frank
08/22/2009 04:57 PM by
Frank

Alex, is the source code of the benchmark itself publicly available? Even if we can argue all year long about how useful the benchmark itself is, there might be some people interested in actually profiling the frameworks to see where the 'bottleneck' is.

Alex Yakunin
08/22/2009 04:58 PM by
Alex Yakunin

@ Oren: Which one? With hummer? Yes. If you're talking about this picture, I can only repeat the same ("I explained many many times why we don't test SqlBulkCopy"). I agree with your point: appropriate tool must be used for bulk insertions. But I wrote many many times in fact we didn't measure perf. of bulk insertions.

Ok, 100 insertions can be considered as bulk insertion operation (think about many-to-many rel. operations)? 10 insertions? Note that exact number does not matter much for the purpose of this test.

Btw, you still didn't answer on my "bet". Sorry for pushing on you, but since you're criticizing me publicly, I think you must follow the same rules here as well.

Ayende Rahien
08/22/2009 05:02 PM by
Ayende Rahien

Alex,

Your tests are all about running queries in a loop.

Whenever you call the DB in a loop, it is a bug.

Take a look at the fallacies of distributed computing to understand exactly why.

As for your bet, assuming you mean batching, it is meaningless. NH has this for 4 years.

Alex Yakunin
08/22/2009 05:29 PM by
Alex Yakunin

Alex, is the source code of the benchmark itself publicly available?

Yes: http://code.google.com/p/ormbattle/downloads/list

If you need access to source code repository (there is most current test code), please write to info @ ormbattle.net.

As for your bet, assuming you mean batching, it is meaningless. NH has this for 4 years.

Well, I promised to describe our own batching & related techniques, and I'm writing the post about this ;) So I'm talking about the ideas I'm going to share.

I don't care about ADO.NET batching - i.e. obviously, I knew it exists, and I didn't mean it.

Moreover, I also wrote about materialization speed. If current materialization speed of NH is good enough, just say you won't optimize this further, and confirm that our results are meaningless in real life in action.

chris
08/22/2009 06:03 PM by
chris

Ayende, i like reading your blog very much but please please stop blogging about performance and benchmark for a while. I am really tired of seeing this one special face here over and over again. I miss blogs like the one about the erlang stuff. I would like to read something about the Axum Incubation Project, F# or maybe about the sqlalchemy orm for pyhon. it should work with ironpython now. What is possible with an orm running on the DLR compared to CLR ones? Or is it more fun to use NH on ironpython ? Thanks for your very informative blogs. I am 20 years in the IT business and learn something knew with every post from you. I really enjoy it ...but not this alex stuff...

Jan
08/22/2009 06:19 PM by
Jan

chris +1

I don't think this company and its advertisements are worth your energy.

The washing powder of the advertising company always washes cleaner than the "regular washing powder".

Dmitry
08/22/2009 06:45 PM by
Dmitry

I agree with Chris. I don't think anybody who has common sense will be ditching NHibernate for the ORMBattle ORM just because of some basic benchmarks.

I would much rather read about the Macto project progress, non-SQL databases, etc.

Ayende Rahien
08/22/2009 07:49 PM by
Ayende Rahien

Alex,

You miss a very important point.

I have no interest in participating in your company's marketing ploys.

Olav Rask
08/22/2009 08:32 PM by
Olav Rask

I can totally understand why you don't want to take this guys crap lying down, but i have to +1 as well - this post is a slam dunk. There just is no debating this fool if this doesn't make him realize the fallacy of his premise.

The picture of the "hummer" sums it up pretty nicely :)

firefly
08/22/2009 08:39 PM by
firefly

I think this post is helpful. Still is there any emptor caveat in using a stateless session? Would NHibernate Prof help guide me in anyway?

On a side note, I personally would like Alex to go away. He isn't adding anything useful to the mix. I vote to ban him from this site.

Ray
08/22/2009 09:08 PM by
Ray

I don't know what exactly Alex tried to achieve but I can say with most certainty: not me, nor my company will EVER even consider their ORM tool no matter how good it is.

Ray
08/22/2009 09:17 PM by
Ray

firefly,

IStatelessSession is a really tricky one. I'd say: if you need large batching operations and NHibernate performance doesn't cut it for you, go for raw ADO.NET batching instead as Oren mentioned.

Stephen
08/22/2009 10:02 PM by
Stephen

Alex I'm wondering, even if your orm does these kind of operations faster, the whole point is.. an orm isn't a batch processor, you can get as excited as you want that your orm is king of batching, but put in the ring with batching systems your orm won't stand a chance.

These sort of benchmarks just don't show a reality, maybe the reality is that your orm is faster, but you aren't convincing anyone with those kinda tests.. as you can see you are just alienating yourself, and I don't buy the argument that any press is good, not in such a professional environment.

alwin
08/22/2009 10:33 PM by
alwin

I really like your screwhammer picture! It really sums up what this whole 'techie soap' is about.

pete w
08/23/2009 02:39 AM by
pete w

Ayende

There have been times in the past when I have made the mistake of designing "long and open" sessions, and leveraging NHibernate for ETL.

As an API user I assumed these would be realistic things to do. Maybe it was my own ignorance, but do you know of any place that helps describe when NHibernate is NOT a good idea?

Ayende Rahien
08/23/2009 04:43 AM by
Ayende Rahien

@Firefly,

IStatelessSession is mostly for just such tasks, when you want to go through a lot of data very fast.

In essence, it is a shortcut through NHibernate, taking care of only very few things in order to speed things up.

NH Prof can certainly track stateless sessions, yes

@Ray,

The fun of using IStatelessSession is that you don't need to worry about hyrdrating your entities, you can take advantage of NH's mapping, database independent, etc.

pete w
08/23/2009 04:52 AM by
pete w

thanks!

Brendan
08/23/2009 09:47 PM by
Brendan

My thingy is faster then your thingy blah blah blah...

Dont forget mySQL was fast too, until they started implementing features.

Alex Yakunin
08/24/2009 01:42 PM by
Alex Yakunin

@ Stephen: I'm curious, what is batching system. Is this related to databases / ORM tools at all?

I wrote about batching because, as tests show, many ORM have this feature, and, argue you or not, it is important. Batching in ORM appeared much earlier than e.g. these tests.

So I don't understand your point.

Alex Yakunin
08/24/2009 01:43 PM by
Alex Yakunin

Dont forget mySQL was fast too, until they started implementing features.

Well, this isn't our case, believe me ;)

LEXX
08/25/2009 09:04 AM by
LEXX

@Alex,

You d better relax man. I really think Ayende should just ban you on his blog. The reason he haven't done this yet is that he is a respectful person. But you simply use others person blog for promotion that f..ing dirty. IMHO, nobody will take your product for serious for couple of reasons:

  1. No tricks will force me to prefer a well known, open source, time proof, well supported system (NH) to a commercial product done by few guys from Ural. I am not that crazy.

  2. The only company phone number I found on the web site is the cell which belongs to you personally (Megafon - Ural - cheapest russian network) - ridiculous

  3. Your promotion is really dirty, you gain no respect from community. So I think your customer relation is the same.

  4. I doubt that your company has any idea about delivery management. All that I see is that you keep saying - we fix this immediately. So I expect that the product lacks code coverage and so on.

J
08/25/2009 01:11 PM by
J

Not to beat a dead horse to death, but there seems to be some reasonable folks out there doing comparisons.

I just ran across this:

gregdoesit.com/.../nhibernate-vs-entity-framewo...
and basically the bottom line is: choose an ORM for the features, and don't worry about performance, since the tests aren't real world anyways.

Benetton
08/25/2009 02:50 PM by
Benetton

@LEXX

U d better show at least some respect. Estimating orm tools from the phone numbers and geographical coordinates - unheard stupidity for my money. BTW I hope Ayende doesnt need any support from such emo boys. Anyway - that's ok! What is not ok is that LEXX represents the general level of community here - agressive, arrogant and disrespectful to other points of view.

No need to wonder if most of the guys bashing Alex have ever given a try to his tool.

Alex Yakunin
08/25/2009 03:05 PM by
Alex Yakunin

Nicer and nicer:

done by few guys from Ural

Yep, 6-person team x 2.5 years. Now honestly compare this to NH team.

The only company phone number I found on the web site

Phone & fax: +7 343 263 7174 (general), sales: +7 922 222 7300

which belongs to you personally

Yes. That's my phone, and I never hide this.

(Megafon - Ural - cheapest russian network)

Definitely not cheapest.

ridiculous

I'd say, clever. Our working hours significantly differ from Europe and US, that's why I leave my cell phone. And yes, I answer even at 4AM.

I doubt that your company has any idea about delivery management. All that I see is that you keep saying - we fix this immediately.

So do you see anything that wasn't fixed? ;)

Estimating orm tools from the phone numbers and geographical coordinates - unheard stupidity for my money.

Actually I couldn't assume such ... even exist.

Alex Yakunin
08/25/2009 03:12 PM by
Alex Yakunin

So I expect that the product lacks code coverage and so on.

~ 75%, I'll provide full info shortly. That's lower than code coverage of NH, but definitely not significantly: stackoverflow.com/.../is-there-any-info-on-nhib...

Btw, NH has really good code coverage. But what about the others? Frankly speaking, I was unable to find any coverage info about other ORM tools at all.

So, @LEXX... Your post is definitely uber-fail here ;)

gunteman
08/25/2009 04:53 PM by
gunteman

Alex, I believe you're a good guy, with honest intentions and good product. But I also believe that you are doing yourself and your product a big disservice by crusading here. You obviously struck a nerve with Ayende, and vice versa, but stop this now, before it's too late.

Mr. Loverman
08/25/2009 07:16 PM by
Mr. Loverman

for those of you who still don't know what the only intention of alex here is read this book:

Positioning: The Battle for Your Mind, 20th Anniversary Edition

...Positioning describes a revolutionary approach to creating a "position" in a prospective customer's mind that reflects a company's own strengths and weaknesses as well as those of its competitors...

@ALL: if you keep commenting alex posts he will continue posting and posting. Just stop doing this and in a while he will be forgotten....

Alex Yakunin
08/26/2009 12:17 PM by
Alex Yakunin

Although it isn't related to the subject, the book is really good.

Alex Yakunin
08/26/2009 12:18 PM by
Alex Yakunin

Just stop doing this and in a while he will be forgotten...

I'd be glad, but you comment and comment ;)

goku_da_master
09/10/2009 04:19 PM by
goku_da_master

Thanks ayende for this blog. NHibernate is giving me performance issues even on the most trivial of tasks, so I hope what you put in this article will help me. Regardless of how powerful and easy an ORM tool can make your life, peformance will always be at the top of the list. Great will be the day when an ORM can peform very close to straight ADO.Net. I find all your posts informative. Keep it up.

Ayende Rahien
09/10/2009 04:22 PM by
Ayende Rahien

Goku,

That hasn't been my experience, moreover, I can tell you that often, NH out performs hand rolled data access layers.

Comments have been closed on this topic.