NHibernate and the second level cache tips

time to read 3 min | 573 words

I have been tearing my hair our today, because I couldn't figure out why something that should have worked didn't ( second level caching , obviously ).

Along the way, I found out several things that you should be aware of. First of all, let us talk about what the feature is, shall we?

NHibernate is design as an enterprise OR/M product, and as such, it has very good support for running in web farms scenarios. This support include running along side with distributed caches, including immediate farm wide updates.  NHibernate goes to great lengths to ensure cache consistency in these scenarios (it is not perfect, but it is very good). A lot of the things that tripped me today were related to just that, NHibernate was working to ensure cache consistency and I wasn't aware of that.

The way it works, NHibernate keeps three caches.

  • The entities cache - the entity data is disassembled and then put in the cache, ready to be assembled to entities again.
  • The queries cache - the identifiers of entities returned from queries, but no the data itself (since this is in the entities cache).
  • The update timestamp cache - the last time a table was written to.

The last cache is very important, since it ensures that the cache will not serve stale results.

Now, when we come to actually using the cache, we have the following semantics.

  • Each session is associated with a timestamp on creation.
  • Every time we put query results in the cache, the timestamp of the executing session is recorded.
  • The timestamp cache is updated whenever a table is written to, but in a tricky sort of way:
    • When we perform the actual writing, we write a value that is somewhere in the future to the cache. So all queries that hit the cache now will not find it, and then hit the DB to get the new data. Since we are in the middle of transaction, they would wait until we finish the transaction. If we are using low isolation level, and another thread / machine attempts to put the old results back in the cache, it wouldn't hold, because the update timestamp is into the future.
    • When we perform the commit on the transaction, we update the timestamp cache with the current value.

Now, let us think about the meaning of this, shall we?

If a session has perform an update to a table, committed the transaction and then executed a cache query, it is not valid for the cache. That is because the timestamp written to the update cache is the transaction commit timestamp, while the query timestamp is the session's timestamp, which obviously comes earlier.

The update timestamp cache is not updated until you commit the transaction! This is to ensure that you will not read "uncommited values" from the cache.

Another gotcha is that if you open a session with your own connection, it will not be able to put anything in the cache (all its cached queries will have invalid timestamps!)

In general, those are not things that you need to concern yourself with, but I spent some time today just trying to get tests for the second level caching working, and it took me time to realize that in the tests I didn't used transactions and I used the same session for querying as for performing the updates.