Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

Posts: 6,919 | Comments: 49,398

filter by tags archive
time to read 1 min | 145 words

This is a bit from the docs for NH Prof, which I am sharing in order to get some peer review.

This warning is raised when the profiler detects that you are writing a lot of data to the database. Similar to the warning about too many calls to the database, the main issue here is the number of remote calls and the time they take.

We can batch together several queries using NHibernate's support for Multi Query and Multi Criteria, but a relatively unknown feature for NHibernate is the ability to batch a set of write statements into a single database call.

This is controlled using the adonet.batch_size setting in the configuration. If you set it to a number larger than zero, you can immediately start benefiting from reduced number of database calls. You can even set this value at runtime, using session.SetBatchSize().

time to read 2 min | 353 words

This is a bit from the docs for NH Prof, which I am sharing in order to get some peer review.

One of the most expensive operations that we can do in our applications is to make a remote call. Going beyond our own process is an extremely expensive operation. Going beyond the local machine is more expensive yet again.

Calling the database, whatever to query or to write, is a remote call and we want to reduce the number of remote calls as much as possible. This warning is being raised when the profiler notice that a single session is making an excessive number of calls to the database. This is usually an indicative of a potential optimization in the way the session is used.

There are several reasons why this can be:

  • Large number of queries as a result of a Select N+1.
  • Calling the database in a loop.
  • Updating (or inserting / deleting) large number of entities
  • Large number of (different) queries that we execute to perform our task.

For the first reason, you can see the suggestions for Select N+1. Calling the database in a loop is generally a bug, and should be avoided. Usually you can restructure the code in a way that doesn't require you to do so.

Updating large number of entities is discussed in Use statement batching, and mainly involves setting the batch size to reduce the number of calls that we make for the database.

The last issue is the more interesting one, in which we need to get data from several sources, and we issue multiple queries for that. The problem is that we issue multiple separate queries for that, which has the aforementioned issues.

NHibernate provide a nice way of avoiding this, by using Multi Query and Multi Criteria, both of which allow you to aggregate several queries into a single call to the database. If this is your scenario, I strongly recommend that you would take a look Multi Query and Multi Criteria and see how you can use them in your application.

time to read 2 min | 357 words

This is a bit from the docs for NH Prof, which I am sharing in order to get some peer review.

The excessive number of rows returned is a warning that is being generated from the profiler when... a query is returning a large number of rows. The simplest scenario is that we simply loaded all the rows in a large table, using something like this code:

session.CreateCriteria(typeof(Order))
	.List<User>();

This is a common mistake when you are binding to a UI component (such as a grid) that perform its own paging. This is a problem is several levels:

  • We tend to want to see only part of the data
  • We just loaded a whole lot of unnecessary data
  • We are sending more data over the network
  • We have higher memory footprint than we should have
  • In extreme cases, we may crash as a result of out of memory exception

None of those are good things, and like the discussion on unbounded result sets, this can be easily prevented by applying a limit at the database level to the number of rows that we will load.

But it is not just simple queries without limit that can cause issue, another common source of this error is Cartesian product when using joins. Let us take a look at this query:

session.CreateCriteria(typeof(Order))
	.SetFetchMode("OrderLines", FetchMode.Join)
	.SetFetchMode("Snapshots", FetchMode.Join)
	.List<Order>();

Assuming that we have ten orders, with ten order lines each and five snapshots each, we are going to load 500 rows from the database. Mostly, they will contain duplicate data that we already have, and NHibernate will reduce the duplication to the appropriate object graph.

The problem is that we still loaded too much data, with the same issues as before. Now we also have the problem that Cartesian product doesn't tend to stop at 500, but escalate very quickly to ridiculous number of rows returned for trivial amount of data that we actually want.

The solution for this issue is to change the way we query the data. Instead of issuing a single query with several joins, we can split this to several queries, and send them all to the database in a single batch using Multi Queries.

time to read 2 min | 352 words

This is a bit from the docs for NH Prof, which I am sharing in order to get some peer review.

Unbounded result set is perform a query without explicitly limiting the number of returned result (using SetMaxResults() with NHibernate, or using TOP or LIMIT clauses in the SQL). Usually, this means that the application is assuming that a query will only return a few records. That works well in development and testing, but it is a time bomb in production.

The query suddenly starts returning thousands upon thousands of rows and in some cases, it is returning millions of rows. This leads to more load on the database server, the application server and the network. In many cases, it can grind the entire system to a halt, usually ending with the application servers crashing with out of memory errors.

Here is one example of a query that will trigger unbounded result set warning:

session.CreateQuery("from OrderLines lines where lines.Order.Id = :id")
       .SetParameter("id", orderId)
       .List();

If the order have many line items, we are going to load all of them, which is probably not what we intended. A very easy fix for this issue is to add pagination:

session.CreateQuery("from OrderLines lines where lines.Order.Id = :id")
	.SetParameter("id", orderId)
	.SetFirstResult(0)
	.SetMaxResult(25)
	.List();

Now we are assured that we need to only handle a predictable number, and if we need to work with all of them, we can page through the records as needed. But there is another common occurrence of unbounded result set, directly traversing the object graph, as in this example:

var order = session.Get<Order>(orderId);
DoSomethingWithOrderLines(order.OrderLines); 

Here, again, we are loading the entire set (in fact, it is identical to the query we issued before) without regard to how big it is. NHibernate does provide robust handling of this scenario, using filters.

var order = session.Get<Order>(orderId);
var orderLines = session.CreateFilter(order.OrderLines, "")
	.SetFirstResult(0)
	.SetMaxResults(25)
	.List();
DoSomethingWithOrderLines(orderLines);

This allow us to page through a collection very easily, and save us from having to deal with unbounded result sets and their consequences.

time to read 2 min | 399 words

This is a bit from the docs for NH Prof, which I am sharing in order to get some peer review.

A common mistake when using a database is that we should use only transactions to orchestrate several write statements. Every operation that the database is doing is done inside a transaction. This include both queries and writes ( update, insert, delete ).

When we don't define our own transactions, we fall back into implicit transaction mode, in which every statement to the database run in its own transaction, resulting in a higher performance cost (database time to build and tear down transactions) and reduced consistency.

Even if we are only reading data, we want to use a transaction, because using a transaction ensure that we get a consistent result from the database. NHibernate assume that all access to the database is done under a transaction, and strongly discourage any use of the session without a transaction.

Example of valid code:

using(var session = sessionFactory.OpenSession()) 
using(var tx = session.BeginTransaction()) 
{ 
	// execute code that uses the session 
	tx.Commit(); 
} 

Leaving aside the safety issue of working with transactions, the assumption that transactions are costly and we need to optimize them is a false one. As already mentioned, databases are always running in transaction. And databases have been heavily optimized to work with transactions. The question is whatever this is per statement or per batch. There is some amount of work that need to be done to create and dispose a transaction, and having to do it per statement is actually more costly than doing it per batch.

It is possible to control the number and type of locks that a transaction takes by changing the transaction isolation level (and indeed, a common optimization is to reduce the isolation level).

NHibernate treat the call to Commit() as the time to flush all changed items from the unit of work to the database, without an explicit Commit(), it has no way of knowing when it should do that. A call to Flush() is possible, but it is generally strongly discouraged, because this is usually a sign that you are not using transactions properly.

I strongly suggest that you would use code similar to the one shown above (or use another approach to transactions, such as TransactionScope, or Castle's Automatic Transaction Management) in order to handle transactions correctly.

time to read 2 min | 378 words

This is a bit from the docs for NH Prof, which I am sharing in order to get some peer review.

Select N+1 is a data access anti pattern, in which we are accessing the database in one of the least optimal ways. Let us take a look at a code sample, and then discuss what is going on. I want to show the user all the comments from all the posts, so they can delete all the nasty comments. The naןve implementation would be something like:

// SELECT * FROM Posts
foreach (Post post in session.CreateQuery("from Post").List()) 
{
     //lazy loading of comments list causes: SELECT * FROM Comments where PostId = @p0
    foreach (Comment comment in post.Comments) 
    {
        //do something with comment
    }
}


In this example, we can see that we are loading a list of posts ( the first select ) and then traversing the object graph. However, we access the lazily loaded collection, causing NHibernate to go to the database and bring the results one row at a time. This is incredibly inefficient, and the NHibernate Profiler will generate a warning whenever it encounters such a case. The solution for this example is simple, we simple force an eager load of the collection up front.

Using HQL:

var posts = session
	.CreateQuery("from Post p left join fetch p.Comments")
	.List();

Using the criteria API:
session.CreateCriteria(typeof(Post)) 
	.SetFetchMode("Comments", FetchMode.Eager) 
	.List();


In both cases, we will get a join and only a single query to the database. Note, this is the classic appearance of the problem, it can also surface in other scenarios, such as calling the database in a loop, or more complex object graph traversals. In those cases, it it generally much harder to see what is causing the issue.

NHibernate Profiler will detect those scenarios as well, and give you the exact line in the source code that cause this SQL to be generated. Another option for solving this issue is: MutliQuery and MultiCriteria, which are also used to solve the issue of Too Many Queries.

FUTURE POSTS

  1. Optimizing access patterns for extendible hashing - about one day from now
  2. Building extendible hash leaf page - 2 days from now

There are posts all the way to Nov 19, 2019

RECENT SERIES

  1. re (24):
    12 Nov 2019 - Document-Level Optimistic Concurrency in MongoDB
  2. Voron’s Roaring Set (2):
    11 Nov 2019 - Part II–Implementation
  3. Searching through text (3):
    17 Oct 2019 - Part III, Managing posting lists
  4. Design exercise (6):
    01 Aug 2019 - Complex data aggregation with RavenDB
  5. Reviewing mimalloc (2):
    22 Jul 2019 - Part II
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats