RavenDB MigrationsRolling Updates
There are several ways to handle schema changes in RavenDB. When I am talking about schema changes, I am talking about changing the format of documents in production. RavenDB doesn’t have a “schema”, of course, but if your previous version of the application had a Name property for customer, and your new version have FirstName and LastName, you need to have some way of handling that.
Please note that in this case I am explicitly talking about a rolling migration, not something that you need to do immediately.
We will start with the following code bases:
Version 1.0 | Version 2.0 |
public class Customer { public string Name {get;set;} public string Email {get;set;} public int NumberOfOrders {get;set;} } |
public class Customer { public string FirstName {get;set;} public string LastName {get;set;} public string CustomerEmail {get;set;} public bool PreferredCustomer {get;set;} } |
As I said, there are several approaches, depending on exactly what you are trying to do. Let us enumerate them in order.
Removing a property – NumberOfOrders
As you can see, NumberOfOrders was removed from v1 to v2. In this case, there is absolutely no action required of us. The next time that this customer will be loaded, the NumberOfOrders property will not be bound to anything, RavenDB will note that the document have changed (missing a property) and save it without the now invalid property. It is self cleaning .
Adding a property – PreferredCustomer
In this situation, what we have is a new property, and we need to provide a value for it. If there isn’t any value for the property in the stored json, it won’t be set, which means that the default value (or the one set in the constructor) will be the one actually set. Again, RavenDB will note that the document have changed, (have an extra property) and save it with the new property. It is self healing .
Modifying properties – Email –> CustomerEmail, Name –> FirstName, LastName
This is where things gets annoying. We can’t rely on the default behavior for resolving this. Luckily, we have the extension points to help us.
public class CustomerVersion1ToVersion2Converter : IDocumentConversionListener { public void EntityToDocument(object entity, RavenJObject document, RavenJObject metadata) { Customer c = entity as Customer; if (c == null) return; metadata["Customer-Schema-Version"] = 2; // preserve the old Name proeprty, for now. document["Name"] = c.FirstName + " " + c.LastName; document["Email"] = c.CustomerEmail; } public void DocumentToEntity(object entity, RavenJObject document, RavenJObject metadata) { Customer c = entity as Customer; if (c == null) return; if (metadata.Value<int>("Customer-Schema-Version") >= 2) return; c.FirstName = document.Value<string>("Name").Split().First(); c.LastName = document.Value<string>("Name").Split().Last(); c.CustomerEmail = document.Value<string>("Email"); } }
Using this approach, we can easily convert between the two version, including keeping the old schema in place in case we still need to be compatible with the old schema.
Pretty neat, isn’t it?
More posts in "RavenDB Migrations" series:
- (26 Aug 2011) Rolling Updates
- (25 Aug 2011) When to execute?
Comments
Very nice :)
How would I register a IDocumentConversionListener? It would be nice to have a link to the relevant documentation.
Daniel, documentStore.RegisterListener(...)
Wouldn't DocumentToEntity break on a Name that only has one word? Or would we get the same word on both FirstName and LastName?
Jose, Maybe, this is code that is specific for a single use case, and as such, can make a lot of assumptions.
I didn't mean to nit-pick and I understand the scope of the code above. My point is that given enough data rolling updates can be a nightmare and dangerous. But yes, RavenDB tackles it in a very elegant way.
Seems pretty straightforward for "trash" fields. But how about indices on top of changed fields? I guess if you need to i.e. search by that field, you'd want you database to migrate all documents to latest format version. What if RavenDB bundled a tool that would let you register the same converters in RavenDB and let it chew documents in the background? :)
How would you handle the situation when a property moved to two new documents and the other way around?
Y, That is why RavenDB has support for set operations
Dmytrii, I don't understand the question, can you give an example?
Sure. Let's say we have a Company with Address, company number etc.
We change the model so that the compamy no longer has address. Insteaad the address is stored in a separet document - Branch.
And then how would you merge Brancge back into Company.
Do you see what I mean?
I don't use ravendb - these are just general data migration questions
Is the support for rollback explicitly missing, i.e.: it is assumed that if I have made a mistake, then I code my way forward and do a new release? Assuming, that new data was created between the release and the discovery of the need for rollback to the last stable version, and I want to convert that data back to the old format (for which I have the code, written at the time I wrote this forward conversion). I want to rollback fast and reliably - no from-dusk-to-down caffeine powered coding sessions. What can I do?
How is the multiple changes scenario supported - there are sites I only use once every 6 months, but I'm sure they have multiple releases during that time. So when I log in, my records might be at version 1, while the current version could be 19. Is it one class per version, and each checks which version number it belongs to? Based on the snippet above, there is no framework support for that, or is there? Or one would just schedule a batch run outside the peak hours, forcing the not yet updated records to be loaded (and thus updated)?
Once you have many different types, all handlers need to be called. They can never be removed because some entities might still not be upgraded. This could become a perf problem (calling 100 listeners for every loaded object). I would have the listeners be registered for a specific type (and optionally for all types).
Hmm, the automatic nature of self healing and self cleaning could be troublesome behaviour in cases where developers didn't know better. Granted that in a perfect world, everyone should be fully versed in the capabilities of their tools, and paying due dilligence to their changes, plus testing those changes thoroughly. But if someone scoping out changes to large document stores with changes across dozens of documents, under pressure, the tool isn't helping catch situations they may have missed, it's actively hiding them.
Case in point, if you ran that scenario through without the listener, the self cleaning would erase "email" and "Name", and add the new fields with default values, would it not?
My point is that the tool doesn't know whether you want to discard or translate old data, and it seems rather dangerous to have it pick a behaviour arbitrarily. My personal preference would be for a tool to detect such changes and require deliberate rules for the specific change. (discard, or translate.)
Do I need a listener if I move the Customer class to a different namespace?
Steve, It's just the default behavior of the Json serializer - it tries to do its best ignoring schema differences and supplying default valuse for missing properties. I wouldn't call it self healing or self cleaning because as you have said, sometimes it's just self destructing. It would be much better to have some schema validation mechanism that is an integral part of the database and that can't be easily bypassed (by not setting up the client correctly). As an example, please have a look at how it has been solved in Persevere: http://www.sitepen.com/blog/2008/11/17/evolving-schemas-with-persevere/
Dmytrii, That requires creating a separate document, probably by just replacing that with the id of the new branch. Although, I would probably do stuff like that as a one time process, since this is a pretty radical change, and not something that you can usually slip in as a gradual transformation
1) Rollback? Just the same way as the forward motion, just in reverse. Do the exact same thing, but reverse the steps. 2) You usually do those sort of things for one version back, which mean that at the next release, you can do the big "check & modify" for the entire db, so you don't have to deal with the 3 versions back version. 2.1) Or you can just keep all of those around and make the checks when you need them in order, based on the version of the entity.
Tobi, Have a MigrationStoreListener that would forward the call based on the type of the entity.
Alex, No, it would resolve that automatically
Ayende, you are right.
Oren,
With the radical changes, when and how would you run the migration?
Doing it as one-time process is not good enough. I need to be able to run such kind of migration in multiple environments.
Cheers.
Dmytrii, If you need to do that, then you don't do radical changes.
I don't argue that it is a radical change. I wonder how it would be handled with RavenDB.
For example, with SQL database I would create a migration (using Migrator.net or similar) that would change the schema accordingly and them migrate the data.
This process would be automated and easily repeatable.
Dmytrii, And you would do pretty much the same thing in RavenDB. But that isn't the scope of this post, it is about rolling update, not point in time update. For those sort of updates, you don't do radical changes, you make things change slowly, across deployments.
That makes sense. But I was curious to see how you would do that (split radical changes into smaller ones?).
It would be amazing to see a write-up about doing this kind of stuff with RavenDB (analogy of Migrator.Net and similar).
Dmytrii, https://github.com/ayende/RaccoonBlog/tree/master/src/RaccoonBlog.Migrations
Thanks :) Just roll your own here sounds easy enough.
Very neat, but if you don't need a listener, is the schema version recorded anyway?
Mike, I don't understand the question
Not sure if you are still checking these comments, but I had a ?
Say you need to do a rolling update, you write the DocumentConversionListener above, and your objects start converting as you encounter them. So this is great for commonly accessed objects, but what about old/archived stuff? Say you have 1,000 Customers and 800 of them get updated in a week but 200 of them are fairly inactive. You don't want to turn off that converter until they are, but you don't want to shutdown the app just for some old data conversions either.
Do you recommend just running a script against the DB to manually convert the data and then pulling out the converter? Or is there another way.
Ryan, Yes, at that point, you'll probably run a script that would convert everything. The alternative is to just keep the converter in place for all time, which is also an option
Great, thanks. I'm about to get involved in a project in it's infancy that is currently being built on Raven so I'm trying to familiarize myself with it more. I didn't like the idea of having to leave these converters all over the place every time the object model changed, so just wanted to make sure there was a way to phase them out.
Comment preview