Migration strategies considerations for Dev–>UAT—>Production

Jun 08 2017

Migration strategies considerations for Dev–>UAT—>Production

time to read 5 min | 866 words

Part of the reason for RavenDB was that I wanted a database that actually took into account how it is being used into account, and provided good support for common usage scenarios. Making the process of moving between Dev –> UAT –> Production easier is a big part of that.

Typically, databases don’t handle that directly, but let the users figure it out. You can see the plethora of SQL Schema deploying and versioning options that you have to deal with.

With RavenDB, for indexes in particular, we made the process very easy. Indexes are defined in code, deployed along side the application and are versioned in the exact same manner, in the exact same system.

But as the feature set of RavenDB grows, we need to consider the deployment scenario in additional places. We recently started talking about the development cycle of ETL processes, Subscriptions, Backups and external replication. The last two are fairly rare in development / UAT scenarios, so we’ll ignore them for now. They are typically only ever setup & used in production. Sometimes you test them on a dedicated instance, but it doesn’t make sense to deploy a backup configuration in most cases. External replication is basically just destination + credentials, so there isn’t really all that much to track or deploy.

ETL Processes and Subscriptions, on the other hand, can contain quite a bit of logic in them. An ETL process that feed into a reporting database might be composed of several distinct pieces, each of them feeding some part of the data to the reporting db. If the reporting needs change, we’ll likely need to update the ETL process as well, which means that we need to consider exactly how we’ll do that. Ideally, we want a developer to be able to start working on the ETL process on their own machine, completely isolated. Once they are done working, they can checkin their work into the code repository and move on to other tasks. At some future time, this code will get deployed, which will setup the right ETL process in production.

That is a really nice story, and how we are dealing with indexes, but it doesn’t actually work for ETL processes. The problem is that ETL is typically not the purview of the application developer, it is in the hand of the operations team or maybe it is owned by the team that owns the reports. Furthermore, changes to the ETL process are pretty common and typically happen outside the release cycle of the application itself. That means that we can’t tie this behavior to the code, unlike indexes, which have a pretty tight integration with the code that is using them, ETL is a background kind of operation, with little direct impact. So it can’t be tied to the application code like indexes is. Even with indexes, we have measures in place that prevent (lock) the index definition, so an administrator can update the index definition on the fly without the application overwriting it with the old version of the index.

Subscriptions are more of a middle ground. A subscription is composed of a client side application that process the data and some server side logic related to filtering and shaping it. On the one hand, it make a lot of sense for the subscribing application to control its subscription, but an admin that wants to update the subscription definition is a very likely scenario. Maybe as a result of a data change, or need input from the business. We can update the server side code without re-deployment, and that is usually a good idea.

To make matters a bit more complex, we also have to consider secrets management. ETL processes, in particular, can contain sensitive information (connection strings). So we need to figure out a way to have the connection string, but not have the connection string Smile . In other words, if I write a new ETL process and deploy it to production, I need to be sure that I don’t need to remember to update the connection string from my local machine to the production database. Or, much worse, if I’m taking the ETL from production, I don’t want to accidently also get the production connection string. That means that we need to use named connection strings, and rely on the developer / admin to set it up properly across environments.

I would really appreciate any feedback you have about how to handle this scenario.

Both ETL processes and Subscriptions are just JSON documents of not too much complexity, so actually moving them around between servers isn’t hard, it is the process of doing so that we are trying to flesh out. I should also mention that we are probably just going to make sure that there is a process to handle that, not something that is mandated, because some companies have very different deployment models that we need to fit into. This is us trying to think about the best way to handle the most common scenario with as little friction as possible.

Tweet Share Share 2 comments

Tags:

Comments

08 Jun 2017
19:42 PM

HdS

For the etl process, I would love two things: First, some way of storing different versions of the damn script. Most of the time, I don't have an easy connection to our git, so I'm stuck with "copy & paste" versioning, which is annoying as hell. Maybe you could use a feature from the new visual studio team services build process: Allow a "draft" version of the etl process which can be tested and approved without changing the current process. Also, each "checkin" can have a comment, which can be very nice. Absolutely great would be a way of comparing the results of the draft and the old version.

Second would be some way to attach other scripts to this process. Most of the time, I deal with diverse etl scripts (somebody has written a part in SPSS / SAS, then the next in python..you get the idea). I would like to have the abillity to keep all scripts in a single place.

08 Jun 2017
22:36 PM

Oren Eini

HdS, The other problem is that we don't want to turn this into a half ass SCM system. You can just call the ETL process "CustomersETL/draft", and that should handle. But checkins, diffing, comments, history etc are out of scope for this. That is why you have a SCM, we don't need to invent it.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB