RavenDB Feature Request Analysis: Filtered Replication ain’t what you looking for

time to read 2 min | 399 words

Every so often we get a request for filtered replication. “I want to replicate to this node, but only those documents.” We explain that replication is a whole database kind of thing, you can’t just pick & choose what you want. That isn’t actually true, we have facilities to do filtering, and it would be fairly easy to expose them.

We don’t intend to do so. And the reason why is that the customer asking the question is usually starting asking us question from midway. He read about replication, thought that it would be a good fit for a particular scenario, if only it had that feature. Except that this is completely the wrong feature to use for the scenario at hand. And usually it takes a little back & forth to figure out what the scenario actually is.

For the most part, the scenarios for this feature are all about synchronizing data between two nodes*. In particular, that is often a use case for: “I have a mobile client and I want to replicate some of the data to that laptop”, or some such.

And this is where things gets complex. To start with, you say, let us just filtered the data where CustomerId = “customers/5”. Except that you need to apply this logic for each entity type in the database, and they usually have different rules about them. For example, you may have common reference data that you would want to replication, even though they don’t belong to customers/5. And invoices may have CustomerId property, but customers does not, so you need to define that for customers, it is the Id that you want to filter by, etc.

To make things even more interesting, you need to consider the case where the sync filter have changed, (this user now have access to “customers/5” and “customers/6”). At which point, you pretty much have to go and go through the entire data set again.

Then we move to the question of updates, how are those handled? What about conflicts? How do you handle disconnected clients that may move between addresses and ips all the time? Who maintains this operation? The client? The server? How about disconnected updates?

In short, it is a very different discussion that you need to have, and just exposing the replication filters won’t be that.

* Nitpicker corner: yes, I know about MS Sync.