Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,592
|
Comments: 51,223
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 334 words

image Davy Brion, one of NHibernate’s committers, is running a series of posts discussing building your own DAL. The reasons for doing so varies, in Davy’s case, he has clients who are adamantly against using anything but the Naked CLR. I have run into similar situations before, sometimes it is institutional blindness, sometimes it is legal reasons, sometimes there are actually good reasons for it, although it is rare.

Davy’s approach to the problem was quite sensible. Deprived of his usual toolset, he was unwilling to give up the advantages inherent to them, so he set up to build them. I did much the same when I worked on SvnBridge, I couldn’t use one of the existing OSS containers, but I wasn’t willing to build a large application without one, so I created one.

I touched about this topic in more details in the past. But basically, a single purpose framework is significantly simpler than a general purpose one. That sounds simple and obvious, right? But it is actually quite significant.

You might not get the same richness that you will get with the real deal, but you get enough to get you going. In fact, since you have only a single scenario to cover, it is quite easy to get the features that you need out the door. You are allowed to cheat, after all :-)

In Davy’s case, he made the decision that using POCO and not worrying about persistence all over the place are the most important things, and he set out to get them.

I am going to shadow his posts in the series, talking about the implications of the choices made and the differences between Davy’s approach and NHibernate. I think it would be a good approach to show the challenges inherit in building an OR/M.

time to read 1 min | 130 words

I know that some people complain about the lack of documentation with NHibernate. That is something that I never truly could grasp. Leaving aside the excellent book, right there in NHibernate’s site we have a huge amount of documentation.

I’ll admit that the plot in the documentation is weak, but I think that reading this document is essential. It covers just about every aspect of NHibernate, and more importantly, it allows you to understand the design decisions and constraints that were used when building it.

Next time you wonder about NHibernate Documentation, just head over there and check. For that matter, I strongly suggest that you will read the whole thing start-to-finish if you are doing anything with NHibernate. It is an invaluable resource.

time to read 4 min | 781 words

Well, this is going to be a tad different than my usual posts, instead of doing technical post, or maybe a SF book review, I am going to talk about two authors that I really like.

David Weber is the author of the Honor Harrington series, the Prince Roger (in conjunction with Ringo) series, the Dahak series and the Safehold series, as well as other assorted books.

John Ringo is the author of the Prince Roger series, the Posleen series, the Council Wars series and a bunch of other stuff.

Both are really good authors, although I much prefer Weber’s books to Ringo’s. Their Prince Roger series of book was flat out amazing, and it is only after I read a lot more of their material that I can truly grasp how much each author contributed to them.

Ringo is way better in portraying the actual details of military, especially marines, SpecOp, etc. Small teams with a lot of mayhem attached. Unfortunately, he seems to be concentrating almost solely on having stupid opponents. I am sorry, but fighting enemies whose tactic is to shout Charge! isn’t a complex task. He is also way too attached to fighting scenes and a large percentage of his books are dedicated to that.

Well, he is Military SF writer, after all, but I think that he is not dedicating enough time to other stuff related to war. And his characters are sometimes unbelievable. The entire concept he base a lot of the Posleen series on is unbelievable in the extreme. No, not because it is SF. Because it goes against human nature to do some of the thing he portray them doing. The end of the Posleen war, for example, was one such case. The fleet comes back home, violating orders of supposedly friendly alien masters that want to see Earth destroyed by another bunch of aliens.

The problem is not that the fleet comes home in violation of orders, the problem is that it didn’t do so much sooner than that. Humans are not wired for something like that, especially since it was made clear that long before the actual event the fleet was well aware of what is going on. I spent 4 years in a military prison, orders be damned, I know exactly how far you can stretch that. And you can’t stretch it far at all. Not on a large scale with normal psych humans.

Or when one race of aliens is trying to subvert the war effort to help another race kill more humans. That is believable. What isn’t believable that the moment it was made widely spread knowledge they weren’t all exterminated. Instead, Ringo made them rulers. It makes for a good story, but I just didn’t find it believable at all. The books are still good, but the belief suspension required to go on with the story is annoying.

On a more personal note, I think Ringo is also a right winged red necked nutcase. A great author, admittedly, but I find it hard sometimes to not get annoyed about some of the perspectives that I see in the book.

Weber, on the other hand, is great in portraying navies. And I love reading his fight scenes. Mostly because he knows where to put them and how much to stretch them. He also put an amazing amount of depth into the worlds he create in surprisingly little brush strokes.

He does have a few themes that I also find fairly annoying. Chief among them, while not as annoying as having stupid enemies (which he have to some amount as well), is having the “good” side have amazingly good information about the other. Or have one side significantly better armed than the other. Sure it make it easy to make the good guys win, but I like a more realistic scenario.

His recent books in the Honor Harrington universe has portrayed exactly such a scenario and have been a pleasure to read. Beyond anything else, he knows how to give a depth to his universe, and his characters are well polished and likable. I can’t think of a scenario where a character has behaved in a way that I would consider wrong.

Weber is currently my favorite author, and I am eagerly waiting for Torch of Freedom in November.

But hands down, the best series is the Prince Roger series, on which they collaborated. This is a tale that has both Weber’s depth in creating a universe and Ringo’s touch for portraying military people. I wish there would be more books there.

time to read 8 min | 1434 words

I originally titled this post NHibernate Stupid Perf Tricks, but decided to remove that. The purpose of this post is to show some performance optimizations that you can take advantage of with NHibernate. This is not a benchmark, the results aren’t useful for anything except comparing to one another. I would also like to remind you that NHibernate isn’t intended for ETL scenarios, if you desire that, you probably want to look into ETL tools, rather than an OR/M developed for OLTP scenarios.

There is a wide scope for performance improvements outside what is shown here, for example, the database was not optimized, the machine was used throughout the benchmark, etc.

To start with, here is the context in which we are working. This will be used to execute the different scenarios that we will execute.

The initial system configuration was:

<hibernate-configuration xmlns="urn:nhibernate-configuration-2.2">
  <session-factory>
    <property name="dialect">NHibernate.Dialect.MsSql2000Dialect</property>
    <property name="connection.provider">NHibernate.Connection.DriverConnectionProvider</property>
    <property name="connection.connection_string">
      Server=(local);initial catalog=shalom_kita_alef;Integrated Security=SSPI
    </property>
    <property name='proxyfactory.factory_class'>
	NHibernate.ByteCode.Castle.ProxyFactoryFactory, NHibernate.ByteCode.Castle
     </property>
    <mapping assembly="PerfTricksForContrivedScenarios" />
  </session-factory>
</hibernate-configuration>

The model used was:

image 

And the mapping for this is:

<class name="User"
			 table="Users">
	<id name="Id">
		<generator class="hilo"/>
	</id>

	<property name="Password"/>
	<property name="Username"/>
	<property name="Email"/>
	<property name="CreatedAt"/>
	<property name="Bio"/>

</class>

And each new user is created using:

public static User GenerateUser(int salt)
{
	return new User
	{
		Bio = new string('*', 128),
		CreatedAt = DateTime.Now,
		Email = salt + "@example.org",
		Password = Guid.NewGuid().ToByteArray(),
		Username = "User " + salt
	};
}
Our first attempt is to simply check serial execution speed, and I wrote the following (very trivial) code to do so.
const int count = 500 * 1000;
var configuration = new Configuration()
	.Configure("hibernate.cfg.xml");
new SchemaExport(configuration).Create(false, true);
var sessionFactory = configuration
	.BuildSessionFactory();

var stopwatch = Stopwatch.StartNew();

for (int i = 0; i < count; i++)
{
	using(var session = sessionFactory.OpenSession())
	using(var tx = session.BeginTransaction())
	{
		session.Save(GenerateUser(i));
		tx.Commit();
	}

}

Console.WriteLine(stopwatch.ElapsedMilliseconds);

Note that we create a separate session for each element. This is probably the slowest way of doing things, since it means that we significantly increase the number of connections open/close and transactions that we need to handle.

This is here to give us a base line on how slow we can make things, to tell you the truth. Another thing to note that this is simply serial. This is just another example of how this is not a true representation of how things happen in the real world. In real world scenarios, we are usually handling small requests, like the one simulated above, but we do so in parallel. We are also using a local database vs. the far more common remote DB approach which skew results ever furhter.

Anyway, the initial approach took: 21.1 minutes, or roughly a row every two and a half milliseconds, about 400 rows / second.

I am pretty sure most of that time went into connection & transaction management, though.

So the first thing to try was to see what would happen if I would do that using a single session, that would remove the issue of opening and closing the connection and creating lots of new transactions.

The code in question is:

using (var session = sessionFactory.OpenSession())
using (var tx = session.BeginTransaction())
{
	for (int i = 0; i < count; i++)
	{
		session.Save(GenerateUser(i));
	}

	tx.Commit();
}

I expect that this will be much faster, but I have to explain something. It is usually not recommended to use the session for doing bulk operations, but this is a special case. We are only saving new instances, so the flush does no unnecessary work and we only commit once, so the save to the DB is done in a single continuous stream.

This version run for 4.2 minutes, or roughly 2 rows per millisecond about 2,000 rows / second.

Now, the next obvious step is to move to stateless session, which is intended for bulk scenarios. How much would this take?

using (var session = sessionFactory.OpenStatelessSession())
using (var tx = session.BeginTransaction())
{
	for (int i = 0; i < count; i++)
	{
		session.Insert(GenerateUser(i));
	}
	tx.Commit();
}

As you can see, the code is virtual identical. And I expect the performance to be slightly improved but on par with the previous version.

This version run at 2.9 minutes, about 3 rows per millisecond and close to 2,800 rows / second.

I am actually surprised, I expected it to be faster, but it was much faster.

There are still performance optimizations that we can make, though. NHibernate has a rich batching system that we can enable in the configuration:

<property name='adonet.batch_size'>100</property>

With this change, the same code (using stateless sessions) runs at: 2.5 minutes and at 3,200 rows / second.

This doesn’t show as much improvement as I hoped it would. This is an example of how a real world optimization is actually failing to show its promise in a contrived example. The purpose of batching is to create as few remote calls as possible, which dramatically improve performance. Since we are running on a local database, it isn’t as noticeable.

Just to give you some idea about the scope of what we did, we wrote 500,000 rows and 160MB of data in a few minutes.

Now, remember, those aren’t numbers you can take to the bank, their only usefulness is to know that by a few very simple acts we improved performance in a really contrived scenario by 90% or so. And yes, there are other tricks that you can utilize (preparing commands, increasing the batch size, parallelism, etc). I am not going to try to outline then, though. For the simple reason that performance should be quite enough for everything who is using an OR/M. That bring me back to me initial point, OR/M are not about bulk data manipulations, if you want to do that, there are better methods.

For the scenario outlined here, you probably want to make use of SqlBulkCopy, or the equivalent for doing this. Just to give you an idea about why, here is the code:

var dt = new DataTable("Users");
dt.Columns.Add(new DataColumn("Id", typeof(int)));
dt.Columns.Add(new DataColumn("Password", typeof(byte[])));
dt.Columns.Add(new DataColumn("Username"));
dt.Columns.Add(new DataColumn("Email"));
dt.Columns.Add(new DataColumn("CreatedAt", typeof(DateTime)));
dt.Columns.Add(new DataColumn("Bio"));

for (int i = 0; i < count; i++)
{
	var row = dt.NewRow();
	row["Id"] = i;
	row["Password"] = Guid.NewGuid().ToByteArray();
	row["Username"] ="User " + i;
	row["Email"] = i + "@example.org";
	row["CreatedAt"] =DateTime.Now;
	row["Bio"] =  new string('*', 128);
	dt.Rows.Add(row);
}

using (var connection = ((ISessionFactoryImplementor)sessionFactory).ConnectionProvider.GetConnection())
{
	var s = (SqlConnection)connection;
	var copy = new SqlBulkCopy(s);
	copy.BulkCopyTimeout = 10000;
	copy.DestinationTableName = "Users";
	foreach (DataColumn column in dt.Columns)
	{
		copy.ColumnMappings.Add(column.ColumnName, column.ColumnName);
	}
	copy.WriteToServer(dt);
}

And this ends up in 49 seconds, or about 10,000 rows / second.

Use the appropriate tool for the task.

But even so, getting to 1/3 of the speed of SqlBulkCopy (the absolute top speed you can get to) is nothing to sneeze at.

time to read 2 min | 346 words

This post is going to be short, because I don’t think that at my current state of mind, I should be posting anything that contain any opinions. This is a reply to a blog post from X-tensive titled: Why I don't believe Oren Eini, which is a reply to my posts about the ORMBattle.NET nonsense benchmarks.

I will just say that if I was willing to let X-tensive enjoy the benefit of the doubt the reason they run the benchmarks the way they did, the post has made it all too clear that doubt shouldn’t be entertained.

Here are a few quotes from the post that I will reply too.

I confirm this is achieved mainly because of automatic batching of CUD sequences. It allows us to outperform frameworks without this feature by 2-3 times on CUD tests. So I expect I will never see this feature implemented in NHibernate.

That is interesting, NHibernate has batching support since 2006. The tests that X-tensive did never even bother to configure NHibernate to use batching. For that matter, they didn’t even bother to do the most cursory check. Googling NHibernate Batching gives you plenty of information about this feature.

Currently (i.e. after all the fixes) NHibernate is 8-10 times slower than EF, Lightspeed and our own product on this test.

Yes, for a test that was explicitly contrived to show that.

Since we track performance on our tests here, we'll immediately see, if any of these conditions will be violated. And if this ever happen, I expect Oren must at least publicly apologize for his exceptional rhino obstinacy ;)

Um, no, I don’t think so. Moreover, not that I think that it would do much good, but I still think that personal attacks are NOT cool. And no, a smiley isn’t going to help, not is something like this:

P.S. Guys, nothing personal.

time to read 2 min | 315 words

One of the nice things about NH Prof is the way that it is setup from an infrastructure stand point. Take a look at the following email:

image

I got this email, along with an additional one containing the actual crash information (this email is generated from the user’s input about the reason for NH Prof crashing).

This isn’t the first time that I get this bug (it is an annoying threading issue) but this time I was able to finally pinpoint the exact place where this is happening, trace it back to where it is used and get to the slap-fore-head-OMG-I-am-stupid moment.

Then it was adding a few locks, refactoring to make it clean, and commit.

At this point, my work is done. We have a CI infrastructure that will compile everything, run the tests and push it out as a released build.

With NH Prof, we take the stance that every single commit is a release. Sometimes we will do some work on branches, but most of the work is done on the trunk, and any commit to the trunk result in a new released build.

Out of close to 400 builds that we have so far, I believe that we had about 5 – 10 lemons (builds that fail on user’s machines). In all cases, it was fixed within a few hours.

The implications for me are huge. I don’t have to worry about managing things. I set it up once (sometimes in December or November, if I recall correctly) and stopped worrying about releasing stuff.

And it means that I can literally fix a bug, commit and tell the user that they can pick up the changes as soon as it is done.

time to read 1 min | 158 words

I intentionally don’t intend to give out enough information about this problem, I want to see what your opinion is.

I have an application where a certain action invalidate some of the data that the user is shown. It is quite expensive to recalculate that data, so we can’t just recalculate it right then and there, and in many cases, it will be the exact same data as the user is currently shown.

The question is, what should we do with this data?

  • Ignore the  invalidation and just show the (possibly invalid) data to the user until the application refresh itself normally.
  • Remove the data all together. However, the lack of the data is already meaningful in the application.
  • Put some notification that the data is invalid.

I have my own opinions, but I would like to hear what you think we should do…

time to read 1 min | 87 words

This is a story from a recent planning meeting that we had. One of the developers was explaining how he is implementing a certain feature, and he went over a potential problem that we would have. That problem is a potential one, because it would only show up if we extend the project in a certain direction, which isn’t currently planned.

My response for that was: “Let the users submit a bug report for that if it happens”.

It is a cruder form of YAGNI.

Git horror

time to read 2 min | 226 words

I am trying to do something that I think is very simple, clone an SVN repository to a local Git repository.

Unfortunately, just about anything that I do end up in one error or another.

E:\rt\rt>git --version
git version 1.6.4.msysgit.0

E:\rt\rt>cd..

E:\rt>git svn clone https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools rt.git
Initialized empty Git repository in e:/rt/rt.git/.git/
RA layer request failed: PROPFIND request failed on '/svnroot/rhino-tools': PROP
FIND of '/svnroot/rhino-tools': could not connect to server (https://rhino-tools
.svn.sourceforge.net) at C:\Program Files\Git/libexec/git-core/git-svn line 1699

E:\rt>svn list https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools
BDSLiB/
branches/
contrib/
deprecated/
experiments/
tags/
trunk/

I even tried doing this on my Mac, which worked, but only until revision 2085, after which it refuses to do anything.

I know that I haven’t done much with Git so far, but I don’t think that I am doing something that wrong.

FUTURE POSTS

  1. Semantic image search in RavenDB - about one day from now

There are posts all the way to Jul 28, 2025

RECENT SERIES

  1. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  2. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  3. Webinar (7):
    05 Jun 2025 - Think inside the database
  4. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  5. RavenDB News (2):
    02 May 2025 - May 2025
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}