Using RavenDB Replication in Gossip mode
The most common network topology for RavenDB replication is a full mesh. For example, if you have three nodes in your cluster and a database that reside on all three nodes, you’ll have a replication topology that will look like this:
This works great when the number of nodes that you have in your cluster is reasonably small. However, we recently got a customer question about a different kind of topology. They have a bunch of nodes, in the order of a few dozens, which cooperate to perform some non trivial task. A key part of this is that the nodes are transient and identical. So a new node may pop up, live for a while (days, weeks, months) and then go away. At any given time you might have a few dozen nodes. That kind of environment won’t really work with a full mesh topology. If we would try, it would look something like that (fully connected network with 40 nodes):
This has a total of 780 connections(!) in it. You can create a topology like that, but a lot of the processing power in the network is going to be dedicated to just maintaining these connections. And you don’t actually need it. RavenDB’s replication algorithm is actually a gossip algorithm, and as you grow the number of nodes that take part in the replication, the less connection you need between nodes. In this case, we can take each of the live nodes and connect each of them to four other (random) nodes. The result would look like so:
Remember, each of the nodes is actually connected to a random four other nodes. RavenDB’s replication will ensure that a change to any document in any of the nodes under these conditions will propagate to all the other nodes efficiently.
This approach will also transparently handle any intermediary failures and be robust for nodes coming and leaving on the fly. RavenDB doesn’t implement gossip membership, mostly because that is very heavily dependent on the application and deployment pattern, but once you tell a node who its neighbors are, everything will proceed on its own.
Comments
This sounds like good stuff, but if the nodes only connect to a strictly random set of other nodes, then how do you ensure that you don't get two islands of intra-connected, but not inter-connected nodes (i.e. a partition)? I assume there's some kind of command-and-control above this layer that ensures that the path-length between nodes isn't infinite (and maybe even optimizes mean path), but surely it can't be strictly random - since that could result in a partition?
DBCANON, The random property here is important. Given random selection on each node, the chance that you'll have two partitions is astonishingly small. That said, small happens, quite often.
The way you usually handle that is to split the gossip and the topology layers. RavenDB is responsible for gossiping about the documents, not about the topology layer. For the topology, you'll typically use something like HyParView. See previous post on the topic: https://ayende.com/blog/169441/gossip-much-the-gossip-epidemic-and-other-issues-in-polite-society
Comment preview