Transaction hierarchy in RavenDB or, the value of a write

time to read 6 min | 1001 words

RavenDB offers both single node transactions as well as cluster wide transactions. You are free to use either one or even mix them together. That level of freedom, on the other hand, brings with it its own set of challenges. How do you know what to use? What are the scenarios and implications for each operation?Remember, RavenDB is a distributed database that can allow you to make modification on any node in the cluster.

In essence, this boil down to a simple concept, how important is the write that you are making. In detail, this gets complex. It’s easy to say that if for low importance writes, you’ll use single node transactions, and for high value items, you’ll use cluster wide transactions. But that isn’t correct. The primary issue is what you are trying to achieve. I’m afraid that I have no choice but to dig into this topic.

Let’s consider the following scenario: A user clicked on “Add to Cart” in the application. How should we record this fact? There is a “shopping-carts/ayende” document for this user, which represent their current shopping cart. But how should we save it?

Obviously, we never want to lose an item from the shopping cart, right? We can use a cluster wide transaction here to ensure maximum safety! Except… a cluster wide transaction will fail if the node that we reached cannot access the majority of the nodes in the cluster. Going back to the business, I asked them about it. The answer I got was “never lose an item from the shopping cart”. That means that we need to process the write even if we can reach no other node.

That leads us to single node transactions, which will do just that. However, now we have to deal with another issue, what happened if two concurrent transactions modified the same document on different nodes at the same time? Now we have a conflict, and when the nodes will replicate the data to one another, we’ll need to resolve it somehow. RavenDB will default to resolving to latest, meaning that some of the changes will be lost. However, we can setup a resolution script that can merge our changes between multiple versions of a document.

This is confusing, I’m aware. The rule of thumb goes like this:

  1. Use single node transactions by default – if there are errors / conflicts / issues, let RavenDB resolve them to the latest version (a revision exists so you can recover anything lost).
  2. Use single node transactions + conflict resolver script if you actually care about applying any sort of logic to the merging of conflicts. This is rare, the scenario is usually when we have something that can be modified and merged together. Shopping cart is an excellent example of this.
  3. Use a cluster wide transaction when you would rather fail than go forward if you cannot ensure the operation is successful. This is also rare, usually reserved for things such as selling limited amount of some item.

The default recommendation, let RavenDB manage that and accept that it may select the latest version is not something that I make lightly. It is based on quite a bit of experience in how users are actually using RavenDB.  In almost any business context, you are going to have large parts of the model that have only a single reason to change, even in the worst case assumption. A customer changing its billing address, for example, can be reasonably assumed to want to keep the latest version they put in. There is also no real meaning to concurrency in this scenario, the modification to a particular document is done by the relevant customer directly.  Failures are rare (but they do happen, so you have to account for them), so you need to consider what the impact you’ll have. If this is something that doesn’t have multiple concurrent operations going on it normally (and proper document modeling will suggest that this isn’t the case), you can just ignore the problem.

I’m saying ignore the problem because there is the question on what is the meaning of not ignoring the issue? You can try to write your conflict resolution script, but even with knowledge of your model, what are you expected to do with two conflicting versions of a customer, with different billing addresses in each?

And trying to do something generic doesn’t work. It will fail, but because this is rare, it will happen only a year after deployment, when no one recalls what exactly the behavior is and an error on such a case will cause hard failure in production.

For some cases, like the shopping cart, you can meaningful write merge code, and the scenario make sense, I may click on two “Add to Cart” buttons at the same time from different locations and I don’t want to lose any of that.

The last scenario, using cluster wide transaction, is actually the reserve. Usually RavenDB will jump through all sorts of hoops to ensure that it won’t lose a write, but cluster wide transactions are actually going the other way. They need to fail if they can’t ensure that they went through. In this case, you’ll usually be working on something very specific. The classic example is ensuring a unique user name in the system, we want to fail if we can’t absolutely ensure that this username is unique. But that isn’t something that we want to do all the time, updating the LastLogin time on the user’s document is not something that you need to ensure will be consistent (and in this case, selecting the latest is also by definition the right thing to do).

I like to say that you should use a single node transaction to record that you purchased a lottery ticket, and a cluster wide transaction to record who won the lottery. That gives the right mindset about the stakes involved. I never want to lose the record of a sale, but I want to ensure that once the win is awarded, I get it absolutely right.