Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

, @ Q j

Posts: 6,796 | Comments: 48,952

filter by tags archive
time to read 3 min | 454 words

If you are tracking the nightlies of RavenDB, the Pull Replication feature has fully landed. You now have three options to chose when you define replication in your systems.

image

External Replication is meant to go from the current database to another database (usually in a different cluster). It is a way to share data with another location. The owner of the replication is the current database, which initiate the connection and send the data to the other side.

Pull Replication reverse this behavior. The first thing you’ll need to do to get Pull Replication working is to define the Pull Replication Hub.

image

As you can see, there isn’t much to do here. We give the hub a name and minimal configuration (how far back this should go, basically). In this case, we are allowing sinks to get the data from the database, with a 20 minutes delay in built into the loop. You can also export the sink configuration from this view. We also define a certificate that provide access to this Hub Pull Replication, this certificate allow only access to this Pull Replication Hub, it grant no additional permissions. In this way, you may have one certificate that provide access to a delayed public stock ticker and another that provides an immediate access to the data.

The next step is to go to the other side, the sink. There, we either manually define the details on the hub (or more likely import the configuration). The sink will then connect to the hub and start pulling the data from it. Here is what this looks like:

image

The idea is that you are very likely to have a lot more sinks than hubs. That is why we make it easy to define the sink just by importing (although in practical terms we expect that this will just be part of a shared image that is deployed many times).

One we have defined the Sink Pull Replication, it will connect to the Hub and start accepting data. You can track how this works from the studio:

image

On the other side, you can track the connected sinks on the Hub:

image

And this is all you need to setup Pull Replication yourself.

time to read 4 min | 603 words

image

The flagship feature for RavenDB 4.2 is the graph queries, but there are a lot of other features that also deserve attention. One of the more prominent second string features is pull replication.

A very common deployment pattern for RavenDB is to have it deployed to the edge. A great example is shown in this webinar, which talk about deploying RavenDB to 36,000 locations and over half a million instances. To my knowledge, this is one of the largest single deployments of RavenDB, this deployment model is frequent in our users.

In the past few months I talked with users would use RavenDB on the edge for the following purposes:

  • Ships at sea, where RavenDB is used to track cargo and ongoing manifests updates. The ships do not have any internet connection while at sea, but connect to the head quarters when they dock.
  • Clinics in health care providers, where each clinic has a RavenDB instance and can operate completely independently if the network is down, but communicates to the central data center during normal operations.
  • Industrial robots, where each robot holds their own data and communicate occasionally with a central location.
  • Using RavenDB as the backing end for an application running on tablets to be used out in the field, which will only have connection to the central database when back in the office.

We call such deployments the hub & spoke model and distinguish between the types of nodes that we have.  We have edge nodes and the central node.

Now, to be clear, both the edge and the central can be either a single node or a full cluster, it doesn’t matter to our discussion.

Pull replication in RavenDB allows you to define a replication definition on the central once. On each of the edges nodes, you define the pull replication definition and that is pretty much it. Each edge node will connect to the central location and start pulling all the data from that database. On the face of it, it seems like a pretty simple process and not much different from external replication, which we already have in RavenDB.

The difference is that external replication is defined on the central node, for each of the nodes on the edges. Pull replication is defined once on the central node and then defined on each of the edges. The idea here is that deploying a new edge node shouldn’t have any impact on the central database. It is pretty common for users to deploy a new location, and you don’t want to have to go and update the central server whenever that happens.

There are a few other aspects of this feature that matters greatly. The most important of them is that it is the edge that initiates the connection to the central node, not the central to the edge. This means that the edge can be behind NAT and you don’t have to worry about tunneling, etc.

The second is about security. Pull replication it its own security measure. When you define a pull replication on the central node, you also setup the certificates that are allowed to utilize that. Those certificates are completely separate from the certificates that are used to access the database in general. So your edge nodes don’t have any access to the database at all, all they can do is just setup the channel for the central node to send them the data.

This is going to make edge deployments and topologies a lot easier to manage and work with in the future.

FUTURE POSTS

  1. Production Postmortem: This data corruption bug requires 3 simultaneous race conditions - about one day from now

There are posts all the way to Feb 18, 2019

RECENT SERIES

  1. Production postmortem (25):
    25 Dec 2018 - Handled errors and the curse of recursive error handling
  2. RavenDB 4.2 Features (3):
    14 Feb 2019 - Pull Replication has landed
  3. Data modeling with indexes (5):
    11 Feb 2019 - Event sourcing–Part II
  4. Making money from Open Source Software (3):
    08 Feb 2019 - How we do it?
  5. Using TLS in Rust (5):
    31 Jan 2019 - Handling messages out of band
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats