The most common network topology for RavenDB replication is a full mesh. For example, if you have three nodes in your cluster and a database that reside on all three nodes, you’ll have a replication topology that will look like this:
This works great when the number of nodes that you have in your cluster is reasonably small. However, we recently got a customer question about a different kind of topology. They have a bunch of nodes, in the order of a few dozens, which cooperate to perform some non trivial task. A key part of this is that the nodes are transient and identical. So a new node may pop up, live for a while (days, weeks, months) and then go away. At any given time you might have a few dozen nodes. That kind of environment won’t really work with a full mesh topology. If we would try, it would look something like that (fully connected network with 40 nodes):
This has a total of 780 connections(!) in it. You can create a topology like that, but a lot of the processing power in the network is going to be dedicated to just maintaining these connections. And you don’t actually need it. RavenDB’s replication algorithm is actually a gossip algorithm, and as you grow the number of nodes that take part in the replication, the less connection you need between nodes. In this case, we can take each of the live nodes and connect each of them to four other (random) nodes. The result would look like so:
Remember, each of the nodes is actually connected to a random four other nodes. RavenDB’s replication will ensure that a change to any document in any of the nodes under these conditions will propagate to all the other nodes efficiently.
This approach will also transparently handle any intermediary failures and be robust for nodes coming and leaving on the fly. RavenDB doesn’t implement gossip membership, mostly because that is very heavily dependent on the application and deployment pattern, but once you tell a node who its neighbors are, everything will proceed on its own.