Ayende @ Rahien

Refunds available at head office

RavenDB–Replication & Master <-> Master support

RavenDB support master/master replication, with the caveat that such scenarios may result in conflicts. I recently had a long discussion with a customer about deploying a clustered RavenDB server.

Setting it up is as simple as having two RavenDB server and telling each one to replicate to the other:

image

One nice aspect of this is that with the way RavenDB replication works, we also get failover for free. With the setup that we have here, we need to configure the client to failover both reads are writes (by default, we only failover reads):

store.Conventions.FailoverBehavior = FailoverBehavior.AllowReadsFromSecondariesAndWritesToSecondaries;

And now we are all set. During normal operations, all writes to the primary would be replicated to the secondary automatically. If the primary goes down for any reason, we will failover to the secondary transparently. When the primary goes up, we will switch back to it, and the secondary will replicate all of the missing writes to the primary server.

So far, so good. There is only one thing that we have to worry about: conflicts.

What happen if during the failure period, we had two writes to the same document, one at the primary and one at the secondary. There is a very slim chance of this is happening, but it is something that we have to deal with. From RavenDB’s perspective, that is considered to be a conflict. We have two documents with the same key that have different ancestry chains. At that point, RavenDB will save all the conflict versions and then create a new conflict document with the document key. Access to the document would result in an error, and give you access to all of the conflicting versions. You now have to resolve the conflict by saving a new version of the document.

So far, easy enough, but the problem is that until the conflict is resolved, that document is not accessible. There are other alternatives, though. RavenDB can’t make decisions on which version of the document is accurate, or how to merge the two versions, but you might have enough information to do so, and we provide an extension point for you to do so:

[InheritedExport]
public abstract class AbstractDocumentReplicationConflictResolver
{
    public abstract bool TryResolve(string id, RavenJObject metadata, RavenJObject document, JsonDocument existingDoc);
}

In fact, something that I suggested and might very well be the conflict resolution strategy is to select one of the documents (arbitrarily) and use that as the “merged” document, but record elsewhere that a conflict has occurred. The reasoning behind that is that we select one of the document versions, and then we have a human that gets to decide if there is anything that needs to be merged back to the live document.

This is useful because the chance of this actually happening are fairly small, we ensure that there is no data loss even if does happen, but more importantly, we don’t waste developer cycles on something that we currently don’t know how to handle. If/when we have a real failover, and if/when that resulted in a conflict, that is when we can start planning for having the next conflict automatically solved, and the best part of that? We don’t even need to solve all of the potential conflicts. We can solve the ones that we know about, and rest assure that anything that we aren’t familiar with would go through the same process of manual resolution that we already set up.

All in all, it is a fairly nice system, even if I do say so myself.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Phillip Haydon
11/23/2011 10:22 AM by
Phillip Haydon

When the primary server comes back online, can you make it the slave, so rather than switching the master back over and dealing with possible conflicts.

You just make the failed master, a slave, and the fail-over stays the master.

This was one of the big changes that happened to SQL Server 2008 (or was it 2005?)

Ayende Rahien
11/23/2011 10:29 AM by
Ayende Rahien

Philip, What happen if you have multiple nodes? And you bring one online just after the original master recovered. Also, how do you (operationally speaking) know which one is the actual master?

tobi
11/23/2011 10:44 AM by
tobi

I think a very practical and simple resolution strategy would be to choose the newest doc according to the tome on the server that generated the version. With synchronized clocks this should be a convenient default.

Ayende Rahien
11/23/2011 10:46 AM by
Ayende Rahien

Tobi, The T-1 version is an order for 1 million dollar. The T version is a modification of the user's name.

Which one of them do you want to keep? How do you make that determination without actually knowing the domain?

That is why we give you a way in, but don't make a decision ourselves.

tobi
11/23/2011 10:52 AM by
tobi

Ayende,

I fully agree. But there are two things that make this simple strategy a credible choice: a) The time window where conflicts can occur is very small because of replication speed (low chance of conflicts). b) Many documents are of low value individually (think of blog comments, forum posts, ...).

Rafal
11/23/2011 12:11 PM by
Rafal

Tobi, usually you set up replication and failover because your documents are of high value to you :) why bother if they aren't

tobi
11/23/2011 02:09 PM by
tobi

Rafal, for load-balancing reasons. If you don't need load balancing, why use master-master? I would use master-slave with failover.

Comments have been closed on this topic.