Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,128 | Comments: 45,550

filter by tags archive

Early lock release, transactions and errors

time to read 3 min | 519 words

There has been quite a lot of research on the notion of early lock release as a way to improve database concurrency. For a while, Voron actually supported that, mostly because I thought that this is a good idea. Lately, however, I decided that it is anything but for modern databases.

In short, this is how transactions behave:

Standard transaction Early Lock Release Transaction
  • Take locks
  • Perform work
  • Commit transaction in memory
  • Flush transaction to log file
  • Release locks
  • Notify user that the transaction successfully completed
  • Take locks
  • Perform work
  • Commit transaction in memory
  • Flush transaction to log file async
  • Release locks
  • Notify user that the transaction successfully completed when the log flush competes

As you note, the major difference is when we release the locks. In the case of ELR, we release the locks immediately when start flushing the transaction state to log. The idea is that we don’t need to wait for potentially length I/O to allow the next transaction to start running. However, we don’t report the transaction as completed before we flushed the transaction.

There is a tiny complication in there, the next transaction cannot start doing the async flushing to log before our transaction is also over, but that is fine and expected, anyway.

However, what about error conditions? What happens if we’ve started to flush the transaction to disk in an async manner, then we got an error. Well, the obvious thing to do here (and as called out in the paper) is to abort the transaction, and then abort any transaction that has started after the failed transaction released its locks.

This is usually okay, because in general, that is no big deal at all. Aside from something like out of disk space errors (which can be resolved by pre-allocating the data), there aren’t really any non catastrophic disk errors. So usually if the disk give you a hard time, it pretty much time to give up and go home.

However, with the use of cloud computing, it is actually pretty common (however much it is a bad idea) to have a “networked disk”. This means that it can certainly be the case that a I/O request you made to the disk can get lost, delayed and just appear to fail (but actually go through). It is the last scenario in particular that worries me. If you actually wrote to the log, but you think that you didn’t, what is your state now?

And while I can handle that in a case where I can fail the transaction and rollback all the previous state, it is much harder to do that if I’ve already committed the transaction in memory, since we might need to do a memory only rollback, and that isn’t something that we are actually setup to do.

In short, we’ll be rolling back early lock release in Voron, it isn’t worth the complexity involved, especially since we already have better ways to handle concurrency.



I ran into a similar problem at the day job a while back. We could get an error on a distributed write but then read and get the data we had just attempted to write. The ugly solution was to read in order to confirm the write--ugly.

Comment preview

Comments have been closed on this topic.


  1. The worker pattern - about one day from now

There are posts all the way to May 30, 2016


  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats