Migration strategies considerations for Dev–>UAT—>Production
Part of the reason for RavenDB was that I wanted a database that actually took into account how it is being used into account, and provided good support for common usage scenarios. Making the process of moving between Dev –> UAT –> Production easier is a big part of that.
Typically, databases don’t handle that directly, but let the users figure it out. You can see the plethora of SQL Schema deploying and versioning options that you have to deal with.
With RavenDB, for indexes in particular, we made the process very easy. Indexes are defined in code, deployed along side the application and are versioned in the exact same manner, in the exact same system.
But as the feature set of RavenDB grows, we need to consider the deployment scenario in additional places. We recently started talking about the development cycle of ETL processes, Subscriptions, Backups and external replication. The last two are fairly rare in development / UAT scenarios, so we’ll ignore them for now. They are typically only ever setup & used in production. Sometimes you test them on a dedicated instance, but it doesn’t make sense to deploy a backup configuration in most cases. External replication is basically just destination + credentials, so there isn’t really all that much to track or deploy.
ETL Processes and Subscriptions, on the other hand, can contain quite a bit of logic in them. An ETL process that feed into a reporting database might be composed of several distinct pieces, each of them feeding some part of the data to the reporting db. If the reporting needs change, we’ll likely need to update the ETL process as well, which means that we need to consider exactly how we’ll do that. Ideally, we want a developer to be able to start working on the ETL process on their own machine, completely isolated. Once they are done working, they can checkin their work into the code repository and move on to other tasks. At some future time, this code will get deployed, which will setup the right ETL process in production.
That is a really nice story, and how we are dealing with indexes, but it doesn’t actually work for ETL processes. The problem is that ETL is typically not the purview of the application developer, it is in the hand of the operations team or maybe it is owned by the team that owns the reports. Furthermore, changes to the ETL process are pretty common and typically happen outside the release cycle of the application itself. That means that we can’t tie this behavior to the code, unlike indexes, which have a pretty tight integration with the code that is using them, ETL is a background kind of operation, with little direct impact. So it can’t be tied to the application code like indexes is. Even with indexes, we have measures in place that prevent (lock) the index definition, so an administrator can update the index definition on the fly without the application overwriting it with the old version of the index.
Subscriptions are more of a middle ground. A subscription is composed of a client side application that process the data and some server side logic related to filtering and shaping it. On the one hand, it make a lot of sense for the subscribing application to control its subscription, but an admin that wants to update the subscription definition is a very likely scenario. Maybe as a result of a data change, or need input from the business. We can update the server side code without re-deployment, and that is usually a good idea.
To make matters a bit more complex, we also have to consider secrets management. ETL processes, in particular, can contain sensitive information (connection strings). So we need to figure out a way to have the connection string, but not have the connection string . In other words, if I write a new ETL process and deploy it to production, I need to be sure that I don’t need to remember to update the connection string from my local machine to the production database. Or, much worse, if I’m taking the ETL from production, I don’t want to accidently also get the production connection string. That means that we need to use named connection strings, and rely on the developer / admin to set it up properly across environments.
I would really appreciate any feedback you have about how to handle this scenario.
Both ETL processes and Subscriptions are just JSON documents of not too much complexity, so actually moving them around between servers isn’t hard, it is the process of doing so that we are trying to flesh out. I should also mention that we are probably just going to make sure that there is a process to handle that, not something that is mandated, because some companies have very different deployment models that we need to fit into. This is us trying to think about the best way to handle the most common scenario with as little friction as possible.