Ayende @ Rahien

It's a girl

RavenDB Migrations: Rolling Updates

There are several ways to handle schema changes in RavenDB. When I am talking about schema changes, I am talking about changing the format of documents in production. RavenDB doesn’t have a “schema”, of course, but if your previous version of the application had a Name property for customer, and your new version have FirstName and LastName, you need to have some way of handling that.

Please note that in this case I am explicitly talking about a rolling migration, not something that you need to do immediately.

We will start with the following code bases:

Version 1.0 Version 2.0
public class Customer
{
    public string Name {get;set;}
    public string Email {get;set;}
    public int NumberOfOrders {get;set;}
}
public class Customer
{
    public string FirstName {get;set;}
    public string LastName {get;set;}
    public string CustomerEmail {get;set;}
    public bool PreferredCustomer {get;set;}
} 

As I said, there are several approaches, depending on exactly what you are trying to do. Let us enumerate them in order.

Removing a property – NumberOfOrders

As you can see, NumberOfOrders was removed from v1 to v2. In this case, there is absolutely no action required of us. The next time that this customer will be loaded, the NumberOfOrders property will not be bound to anything, RavenDB will note that the document have changed (missing a property) and save it without the now invalid property. It is self cleaning Smile.

Adding a property – PreferredCustomer

In this situation, what we have is a new property, and we need to provide a value for it. If there isn’t any value for the property in the stored json, it won’t be set, which means that the default value (or the one set in the constructor) will be the one actually set. Again, RavenDB will note that the document have changed, (have an extra property) and save it with the new property. It is self healing Smile.

Modifying properties – Email –> CustomerEmail, Name –> FirstName, LastName

This is where things gets annoying. We can’t rely on the default behavior for resolving this. Luckily, we have the extension points to help us.

public class CustomerVersion1ToVersion2Converter : IDocumentConversionListener
{
    public void EntityToDocument(object entity, RavenJObject document, RavenJObject metadata)
    {
        Customer c = entity as Customer;
        if (c == null)
            return;

        metadata["Customer-Schema-Version"] = 2;
        // preserve the old Name proeprty, for now.
        document["Name"] = c.FirstName + " " + c.LastName;
        document["Email"] = c.CustomerEmail;
    }

    public void DocumentToEntity(object entity, RavenJObject document, RavenJObject metadata)
    {
        Customer c = entity as Customer;
        if (c == null)
            return;
        if (metadata.Value<int>("Customer-Schema-Version") >= 2)
            return;

        c.FirstName = document.Value<string>("Name").Split().First();
        c.LastName = document.Value<string>("Name").Split().Last();
        c.CustomerEmail = document.Value<string>("Email");
    }
}

Using this approach, we can easily convert between the two version, including keeping the old schema in place in case we still need to be compatible with the old schema.

Pretty neat, isn’t it?

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Nabil
07/21/2011 12:57 PM by
Nabil

Very nice :)

Daniel Lidström
08/26/2011 10:03 AM by
Daniel Lidström

How would I register a IDocumentConversionListener? It would be nice to have a link to the relevant documentation.

Ayende Rahien
08/26/2011 10:15 AM by
Ayende Rahien

Daniel, documentStore.RegisterListener(...)

Jose
08/26/2011 10:32 AM by
Jose

Wouldn't DocumentToEntity break on a Name that only has one word? Or would we get the same word on both FirstName and LastName?

Ayende Rahien
08/26/2011 10:36 AM by
Ayende Rahien

Jose, Maybe, this is code that is specific for a single use case, and as such, can make a lot of assumptions.

Jose
08/26/2011 10:56 AM by
Jose

I didn't mean to nit-pick and I understand the scope of the code above. My point is that given enough data rolling updates can be a nightmare and dangerous. But yes, RavenDB tackles it in a very elegant way.

Y
08/26/2011 11:26 AM by
Y

Seems pretty straightforward for "trash" fields. But how about indices on top of changed fields? I guess if you need to i.e. search by that field, you'd want you database to migrate all documents to latest format version. What if RavenDB bundled a tool that would let you register the same converters in RavenDB and let it chew documents in the background? :)

Dmytrii Nagirniak
08/26/2011 01:22 PM by
Dmytrii Nagirniak

How would you handle the situation when a property moved to two new documents and the other way around?

Ayende Rahien
08/26/2011 01:41 PM by
Ayende Rahien

Y, That is why RavenDB has support for set operations

Ayende Rahien
08/26/2011 01:41 PM by
Ayende Rahien

Dmytrii, I don't understand the question, can you give an example?

Dmytrii Nagirniak
08/26/2011 01:49 PM by
Dmytrii Nagirniak

Sure. Let's say we have a Company with Address, company number etc.

We change the model so that the compamy no longer has address. Insteaad the address is stored in a separet document - Branch.

And then how would you merge Brancge back into Company.

Do you see what I mean?

Péter Zsoldos
08/26/2011 03:35 PM by
Péter Zsoldos

I don't use ravendb - these are just general data migration questions

  1. Is the support for rollback explicitly missing, i.e.: it is assumed that if I have made a mistake, then I code my way forward and do a new release? Assuming, that new data was created between the release and the discovery of the need for rollback to the last stable version, and I want to convert that data back to the old format (for which I have the code, written at the time I wrote this forward conversion). I want to rollback fast and reliably - no from-dusk-to-down caffeine powered coding sessions. What can I do?

  2. How is the multiple changes scenario supported - there are sites I only use once every 6 months, but I'm sure they have multiple releases during that time. So when I log in, my records might be at version 1, while the current version could be 19. Is it one class per version, and each checks which version number it belongs to? Based on the snippet above, there is no framework support for that, or is there? Or one would just schedule a batch run outside the peak hours, forcing the not yet updated records to be loaded (and thus updated)?

tobi
08/26/2011 06:44 PM by
tobi

Once you have many different types, all handlers need to be called. They can never be removed because some entities might still not be upgraded. This could become a perf problem (calling 100 listeners for every loaded object). I would have the listeners be registered for a specific type (and optionally for all types).

Steve Py
08/27/2011 12:42 AM by
Steve Py

Hmm, the automatic nature of self healing and self cleaning could be troublesome behaviour in cases where developers didn't know better. Granted that in a perfect world, everyone should be fully versed in the capabilities of their tools, and paying due dilligence to their changes, plus testing those changes thoroughly. But if someone scoping out changes to large document stores with changes across dozens of documents, under pressure, the tool isn't helping catch situations they may have missed, it's actively hiding them.

Case in point, if you ran that scenario through without the listener, the self cleaning would erase "email" and "Name", and add the new fields with default values, would it not?

My point is that the tool doesn't know whether you want to discard or translate old data, and it seems rather dangerous to have it pick a behaviour arbitrarily. My personal preference would be for a tool to detect such changes and require deliberate rules for the specific change. (discard, or translate.)

Alex Vilela
08/27/2011 09:04 AM by
Alex Vilela

Do I need a listener if I move the Customer class to a different namespace?

Rafal
08/27/2011 09:07 AM by
Rafal

Steve, It's just the default behavior of the Json serializer - it tries to do its best ignoring schema differences and supplying default valuse for missing properties. I wouldn't call it self healing or self cleaning because as you have said, sometimes it's just self destructing. It would be much better to have some schema validation mechanism that is an integral part of the database and that can't be easily bypassed (by not setting up the client correctly). As an example, please have a look at how it has been solved in Persevere: http://www.sitepen.com/blog/2008/11/17/evolving-schemas-with-persevere/

Ayende Rahien
08/28/2011 05:50 AM by
Ayende Rahien

Dmytrii, That requires creating a separate document, probably by just replacing that with the id of the new branch. Although, I would probably do stuff like that as a one time process, since this is a pretty radical change, and not something that you can usually slip in as a gradual transformation

Ayende Rahien
08/28/2011 05:52 AM by
Ayende Rahien

1) Rollback? Just the same way as the forward motion, just in reverse. Do the exact same thing, but reverse the steps. 2) You usually do those sort of things for one version back, which mean that at the next release, you can do the big "check & modify" for the entire db, so you don't have to deal with the 3 versions back version. 2.1) Or you can just keep all of those around and make the checks when you need them in order, based on the version of the entity.

Ayende Rahien
08/28/2011 05:53 AM by
Ayende Rahien

Tobi, Have a MigrationStoreListener that would forward the call based on the type of the entity.

Ayende Rahien
08/28/2011 05:54 AM by
Ayende Rahien

Alex, No, it would resolve that automatically

tobi
08/28/2011 09:16 AM by
tobi

Ayende, you are right.

Dmytrii Nagirniak
08/28/2011 05:01 PM by
Dmytrii Nagirniak

Oren,

With the radical changes, when and how would you run the migration?

Doing it as one-time process is not good enough. I need to be able to run such kind of migration in multiple environments.

Cheers.

Ayende Rahien
08/28/2011 06:48 PM by
Ayende Rahien

Dmytrii, If you need to do that, then you don't do radical changes.

Dmytrii Nagirniak
08/28/2011 06:52 PM by
Dmytrii Nagirniak

I don't argue that it is a radical change. I wonder how it would be handled with RavenDB.

For example, with SQL database I would create a migration (using Migrator.net or similar) that would change the schema accordingly and them migrate the data.

This process would be automated and easily repeatable.

Ayende Rahien
08/28/2011 06:56 PM by
Ayende Rahien

Dmytrii, And you would do pretty much the same thing in RavenDB. But that isn't the scope of this post, it is about rolling update, not point in time update. For those sort of updates, you don't do radical changes, you make things change slowly, across deployments.

Dmytrii Nagirniak
08/28/2011 07:07 PM by
Dmytrii Nagirniak

That makes sense. But I was curious to see how you would do that (split radical changes into smaller ones?).

It would be amazing to see a write-up about doing this kind of stuff with RavenDB (analogy of Migrator.Net and similar).

Ayende Rahien
08/28/2011 07:10 PM by
Ayende Rahien

Dmytrii, https://github.com/ayende/RaccoonBlog/tree/master/src/RaccoonBlog.Migrations

Dmytrii Nagirniak
08/28/2011 07:15 PM by
Dmytrii Nagirniak

Thanks :) Just roll your own here sounds easy enough.

Mike
09/09/2011 09:30 AM by
Mike

Very neat, but if you don't need a listener, is the schema version recorded anyway?

Ayende Rahien
09/09/2011 10:50 AM by
Ayende Rahien

Mike, I don't understand the question

Ryan
11/09/2011 04:52 PM by
Ryan

Not sure if you are still checking these comments, but I had a ?

Say you need to do a rolling update, you write the DocumentConversionListener above, and your objects start converting as you encounter them. So this is great for commonly accessed objects, but what about old/archived stuff? Say you have 1,000 Customers and 800 of them get updated in a week but 200 of them are fairly inactive. You don't want to turn off that converter until they are, but you don't want to shutdown the app just for some old data conversions either.

Do you recommend just running a script against the DB to manually convert the data and then pulling out the converter? Or is there another way.

Ayende Rahien
11/09/2011 08:17 PM by
Ayende Rahien

Ryan, Yes, at that point, you'll probably run a script that would convert everything. The alternative is to just keep the converter in place for all time, which is also an option

Ryan
11/09/2011 11:39 PM by
Ryan

Great, thanks. I'm about to get involved in a project in it's infancy that is currently being built on Raven so I'm trying to familiarize myself with it more. I didn't like the idea of having to leave these converters all over the place every time the object model changed, so just wanted to make sure there was a way to phase them out.

Comments have been closed on this topic.