Ayende @ Rahien

Refunds available at head office

Geo distribution and high availability in RavenDB

A customer asks in the mailing list:

Due to data protection requirements, we have to store a users data closest to where they signed up. For example if I sign up and I’m in London, my data should be stored in the EU.

Given this, how do we ensure when replicating (we will have level 4 redundancy eventually), that any data originally written to a node within say the EU does not get replicated to a node in the states?

The answer here is to use to features of RavenDB together. Sharding and Replication. It is a good thing that they are orthogonal and can work together seamlessly.

Here is how it looks like:

image

The London based user will be sharded to the Ireland server. This server will be replicating to other Ireland based server (or to other servers in the EU). The data never leaves the EU (satisfying the data protection rules), but we get the high availability that we desire.

At the same time, Canadian customers will be served from a nearby states based servers, and they, too, will be replicating to nearby servers.

From a deployment standpoint, what we need to do is the following:

  • Setup a geo distributed sharded cluster using the user’s location.
  • Each shard would be a separate server, which is replicating to N other servers in the nearby geographical area.

And that is pretty much it…

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Phillip Haydon
07/23/2014 09:30 AM by
Phillip Haydon

If a user from the UK flies to Canada for a week, and attempts to authenticate with the system, how does that work?

Does he read/write to the Ireland location after authenticating?

If Canada based users are not bound by data being required to be stored in a certain geographical location like the UK user, is the US data replicated to both US and Ireland?

Ayende Rahien
07/23/2014 09:34 AM by
Ayende Rahien

Phillip, That depend on the system setup. Usually a system is setup so the user data is tied to his physical location, not his current location.

And the CA data might (or might not) be replicated to the UK, probably not. You'll have multiple data centers in the states for failover.

Phillip Haydon
07/23/2014 09:48 AM by
Phillip Haydon

Hmmm, would you only store personal data in specific locations but replicate all other data (assuming you're using the same database and not separating user info into a different database)

i.e if you have an online store, all US personal data is stored in US, UK stored in UK, but all products are replicated between both?

Or would you just separate this stuff into separate databases?

(probably no0b questions, I've never worked on a system that needs to store data in two different locations)

Ayende Rahien
07/23/2014 10:21 AM by
Ayende Rahien

Phillip, Sure, that is easy to do. You have the products database which is replicated, and the user info that isn't.

amin
07/23/2014 01:11 PM by
amin

I wish ravendb does the sharding server-side rather than the client, something like rethinkdb, not all datas are geo splited, assume you have one type of data in big numbers, like a product table with 100 bilion records(I dont mean exactly 100 bilion im sayin number of records than one machine cant process) in this situation developer should do much more if sharding is handle in client than server solution

Ayende Rahien
07/24/2014 08:36 AM by
Ayende Rahien

Amin, There is nothing that prevent you from handling such a scenario in RavenDB. We've several customers who deal with tens of millions of items in collections that are spread over multiple machines. It works, it is flexible and it gives the users a LOT of control when they need it.

Comments have been closed on this topic.