Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,124 | Comments: 45,475

filter by tags archive

How not to deal with Replication Lag

time to read 3 min | 480 words

Because RavenDB replication is async in nature, there is a period of time between a write has been committed on the master and until it is visible to the clients.

A user has requested that we would provide a low latency way to provide a solution to that. The idea was that the master server would report to the secondaries that a write happened, and then they would mark all reads from them for those documents as dirty, until replication caught up.

Implementation wise, this is all ready to happen. We have the Changes API, which is an easy way to get changes from a db. We have the ability to return a 204 Non Authoritative response, so it looks easy.

In theory, it sounds reasonable, but this idea just doesn’t hold water. Let us talk about normal operations. Even with the “low latency” notifications (and replication is about as low latency as it already get), we have to deal with a window of time between the write completing on the master and the notification arriving on the secondaries. In fact, it is the exact same window as with replication. Sure, if you have a high replication load, that might be different, but those tend to be rare (high write load, very big documents, etc).

But let us assume that this is really the case. What about failures?

Let us assume Server A & B and client C. Client C makes a write to A, A notifies B and when C reads from B, it would get 204 response until A replicates to B. All nice & dandy. But what happens when A can’t talk to B ? Remember a server being down is the easiest scenario, the hard part is when both A & B are operational, but can’t talk to one another. RavenDB  is designed to gracefully handle network splits and merges, so what would happen in this case?

Client C writes to A, but A can’t notify B or replicate to it. Client C reads from B, but since B got no notification about a change, it return 200 Ok response, which means that this is the latest version. Problem.

In this case, this is actually a bigger problem than you might consider. If we support the notifications under the standard scenario, user will make assumptions about this. They will have separate code paths for non authoritative responses, for example. But as we have seen, we have a window of time where the reply would say it is authoritative even though it isn’t (a very short one, sure, but still) and under failure scenarios we will out right lie.

It is better not to have this “feature” at all, and let the user handle that on his own (and there are ways to handle that, reading from the master for important stuff, for example).


Chris Marisic

I agree with your conclusion, don't support features that led developers shoot themselves in the face.

Vlad Kosarev

Seriously, http://en.wikipedia.org/wiki/CAP_theorem (agree with Chris B).

Sadly we can't do magic just yet in computer science. Oren, I'm sure if there's one person to finally disprove CAP theorem it would be you :)

Ayende Rahien

Vlad, I am pretty confident of my abilities, but being able to break something as fundamental as that isn't going to happen any time soon.

Comment preview

Comments have been closed on this topic.


  1. The design of RavenDB 4.0: Making Lucene reliable - 4 hours from now
  2. RavenDB 3.5 whirl wind tour: I’ll find who is taking my I/O bandwidth and they SHALL pay - about one day from now
  3. The design of RavenDB 4.0: Physically segregating collections - 2 days from now
  4. RavenDB 3.5 Whirlwind tour: I need to be free to explore my data - 3 days from now
  5. RavenDB 3.5 whirl wind tour: I'll have the 3+1 goodies to go, please - 6 days from now

And 13 more posts are pending...

There are posts all the way to May 30, 2016


  1. RavenDB 3.5 whirl wind tour (14):
    02 May 2016 - You want all the data, you can’t handle all the data
  2. The design of RavenDB 4.0 (13):
    28 Apr 2016 - The implications of the blittable format
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats