Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,949 | Comments: 44,548

filter by tags archive

The subtle distinction between snapshot isolation and read committed


I am using db transaction isolation levels for a reason here, they make it easier to reason about what is going on.

In short, RavenDB currently supports two storage engine options, Esent and Munin. Esent is what we usually use for production, and Munin is usually used for testing. We wrote Munin as a transactional, fully managed, storage engine a while ago. And it has mostly served us well, but Esent is what we usually aim for. That is the production use case.

We recently made a few changes that resulted in test failures on Munin, only in one run out of two dozens or so, but always worked with Esent.

Naturally, because of the random nature of the problem, I suspected the issue being a race condition in Munin. That happened in the past ,and obviously they are very hard to root out completely. But after finally isolating everything down to a simple test case (writing to two “tables” with associated information), I finally figured it out.

Munin is working just fine, it hasn’t got a spec of a problem. It is just that, when we built it, I built it to support Read Commited Isolation Level. While Esent is providing Snapshot isolation level. The code assumes snapshot isolation level at some pretty level. Obviously, this sort of thing shows up as a race condition, and it is extremely hard to debug, as anyone who ever dealt with those issues in RDBMS can testify.

So my task now is not to fix a bug in Munin, but to actually implement snapshot isolation. As it turned out, actually moving Munin from read committed isolation to snapshot isolation was a lot easier than finding the problem.

I am torn between being pleased that I found the issue, happy that Munin doesn’t have a bug and pissed that it took me that long.


Comments

Sean Kearon

So, does that move Munin closer to being suitable for production?

John Bloom

"I am torn between being pleased that I found the issue, happy that Munin doesn’t have a bug and pissed that it took me that long."

I think that developers have to deal with this emotion on a regular basis. And "taking a long time" is relative to the task at hand. Some problems taking 5 minutes is too long and I kick myself for not figuring it out quicker.

edward

why have and maintain 2 storage engines while only one is aimed for production?

Duckie

Munin is written in .net, while esent is a part of windows. The goal, from what i know, is to have a independent storage engine. Munin is just not ready for production. Yet.

http://en.wikipedia.org/wiki/ExtensibleStorageEngine http://ayende.com/blog/4686/raven-munin

Daniel Lang

@Edward: Because Munin can run in-memory, which is actually very handy from time to time (i.e. unit-testing).

Matt Warren

@Edward

One reason is to have a way to run on Linux, although a version running on BDB might be a better option for that (see https://groups.google.com/d/topic/ravendb/WhuwS218-xg/discussion).

But the main reason for Munin at the moment is so that RavenDB can run in-memory, for unit tests.

Edward

@Daniel, @Matt thanks! [P.S. Nice blogs!]

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats