Geo distribution and high availability in RavenDB
A customer asks in the mailing list:
Due to data protection requirements, we have to store a users data closest to where they signed up. For example if I sign up and I’m in London, my data should be stored in the EU.
Given this, how do we ensure when replicating (we will have level 4 redundancy eventually), that any data originally written to a node within say the EU does not get replicated to a node in the states?
The answer here is to use to features of RavenDB together. Sharding and Replication. It is a good thing that they are orthogonal and can work together seamlessly.
Here is how it looks like:
The London based user will be sharded to the Ireland server. This server will be replicating to other Ireland based server (or to other servers in the EU). The data never leaves the EU (satisfying the data protection rules), but we get the high availability that we desire.
At the same time, Canadian customers will be served from a nearby states based servers, and they, too, will be replicating to nearby servers.
From a deployment standpoint, what we need to do is the following:
- Setup a geo distributed sharded cluster using the user’s location.
- Each shard would be a separate server, which is replicating to N other servers in the nearby geographical area.
And that is pretty much it…