Porting MVC Music Store to RavenMigrations
On my last post, I mention that we need to add a CountSold property to all the albums, in most SQL system, something like that can be pretty painful. The syntax for adding a new column is easy, but actually getting it done, and deployed, and versioned, is pretty hard. With Raven, if you add a new property, it will automatically be added to your document when you next save it. There is no action required on your part. The same, by the way, would happen when you remove a property. Raven will clean it up after you.
The question is what happens when we want to set that value to something, not just to the default value? We need to provide that logic somehow, and here is a simple way of doing so;
using (var documentStore = new DocumentStore { Url = "http://localhost:8080" }) { documentStore.Initialise(); using (var session = documentStore.OpenSession()) { IDictionary<string,int> albumToSoldCount = new Dictionary<string, int>(); int count = 0; do { var results = session.Query<SoldAlbum>("SoldAlbums") .Take(128) .Skip(count) .ToArray(); if (results.Length == 0) break; count += results.Length; foreach (var soldAlbum in results) { albumToSoldCount[soldAlbum.Album] = soldAlbum.Quantity; } } while (true); count = 0; do { var albums = session.Query<Album>() .Skip(count) .Take(128) .ToArray(); if (albums.Length == 0) break; foreach (var album in albums) { int value; albumToSoldCount.TryGetValue(album.Id, out value); album.CountSold = value; } count += albums.Length; session.SaveChanges(); session.Clear(); } while (true); } }
To those of you who haven’t bother to read the code, this is reading the index that we previously created and remembering its value. Then we start reading batches of albums and update their counts. All in all, it is quite simple.
An additional nice property of this script is that you can run it is safe to run it multiple times.
More posts in "Porting MVC Music Store to Raven" series:
- (31 May 2010) StoreManagerController, part 2
- (29 May 2010) StoreManagerController
- (28 May 2010) Porting the checkout process
- (25 May 2010) StoreController
- (24 May 2010) Advanced Migrations
- (23 May 2010) Migrations
- (22 May 2010) Porting the HomeController, the Right Way
- (21 May 2010) Porting the HomeController, the map/reduce way
- (20 May 2010) Data migration
- (19 May 2010) Setting up the application
- (18 May 2010) The data model
Comments
It may be simple but it doesn't look natural or easy. It might be one of these unnatural acts on source code where you try to make C# perform some data manipulation tricks and it's like teaching an elephant to climb trees.
In a multi-server deployment scenario, and when doing hot updates, you will still get old-style documents (without the count field) written to the db, and you have no definitive point during which to run the migration. Also - considering a large dataset, that migration will cost you with downtime, and there goes the hot-updates.
What you can do, is to have the read call check for document validity (i.e. countSold.HasValue), and compute it if it fails.
this way you can start deploying the new client code, then start running the migration, and have the "cleanup" code deal with the pieces that fell through the holes.
Ken,
You are correct when you state that the right way to do that is to do the fixups in the app code while reading.
You'll probably want to run code like this to do fixups on the entire thing, but that is not strictly necessary.
The code above is something that I needed for MVC Music Store while I was working on it and adding features that required additional data.
I guess everyone has a different way to write 'while(true)' (my preference), you use 'do / while' while others prefer the 'for (;;)'.
What is the whole reason for the 'Take / Skip' batching? Is it because you have a session TimeOut? 128 seems like a pretty small number since even when dealing with an RDBMS I find myself doing batches of 1000. Also why would you batch the reads? you're not dealing with them in a batch/stream i.e. your just sucking them up so I'm not seeing the benefit of batching them.
I'm not sure if it makes a difference in Raven? but when you're batching in an RDBMS you need to order by primary key (or something similar) so you maintain consistent ordering.
The part I find curious here is that you are batching all within a single 'Session' scope? So if your session has a TimeOut it's going to timeout regardless of a Batch Size. Is your session equivalent to a transaction? e.g. what is the state of the data if the network goes out half-way through an import?
Lastly I think you're code would be even more readable if you had extension method to abstract away the batch logic, i.e. I use an extension method with a signature like:
IEnumerable <t InBatches <t,> (this IQueryable <t linqSource, int batchSize)
which lets me iterate over the albums like a normal collection while hiding the complexity of batching underneath so you could replace a lot of the above with:
foreach (var album in session.Query <album().InBatches(128))
{
}
Isn't the first Skip/Take the wrong way round?
@Demis Bellot - That would indeed make the first scenario a lot cleaner. However, the second time actions are taken per-batch (session save and clear) rather than per-item in the batch. That would be slightly more complex to capture, but still possible. For instance, the extension method could yield a batch (i.e. an IEnumerable) rather than each item in the batch.
foreach (var batch in session.Query <album.InBatchesOf(128)) {
}
@[ICR]
Yep, that's exactly the other extension method I have called 'GetBatches()' which returns a list of batches. I use it when my code can take advantage of the separation, I think I'm just thrown out by the use of the same session here as I'm not exactly sure what's happening under the covers. At work we use a well defined TransactionScope to define our database transaction boundary so the above code would look something like:
foreach (var batch in session.Query<Album>().GetBatches(128)) {
using (var scope = new TransactionScope()) {
}
}
Demis,
The idea is to make the change transactional.
Every SaveChanges operation is transactional.
This is a single use code, no point to writing abstractions to it
ICR,
Not sure what you mean, Skip & Take with Raven are instructions, not executed immediately
Where would you put this script? Atleast with SQL I can have a standalone script that know only the specific schema I have. For the concrete case of adding this extra column we are talking a total of 3 queries, after which all are updated and transactionally correct.
With the above you need an obsene amount of code to do the same thing, you need to have a depencency on your SoldAlbum, which means it cannot be anything standalone. And if you have users of the db at the same time, you cannot really be sure if your update was correct or not. Since your index is even made slightly after the actual update, you cant even know if your index is up2date or not.
Unfortunately, the more examples you are showing of Raven, the less I want to use it :(
I think ICR is right.
The first loop always takes the first 128 doc and skip the remain (for no reason). Like it is written (take and then skip instead of skip and then take), I think it is a never ending loop; the first 128 docs will always return.
// Ryan
@Ryan,
I'm pretty sure Ayende has both Skip and Take implemented lazily. I.E. They don't actually do anything until you enumerate the result with ToArray(). So it knows at that time that it needs to do both and will skip/take appropriately.
@Ryan,
Since the RavenDb is accessed over HTTP REST I imagine its working in very much the same way as that Linq2Sql does (as @Harry suggested) where the expression before '.ToArray()' is not executed right way but is instead serialized over HTTP and executed on the RavenDB server where processing a Skip/Take is not order dependent. After processing, the results are returned and de-serialized into an Album[].
Skip/Take is implicit depending on an ordered set to be used useful.
When the set is ordered, then the order in which Skip/Take is executed will give different results, still with me?
Ok, if Skip/Take gives the same result as Take/Skip then the order of execution is predefined on the server; "always Skip first and then Take", right?
That in itself is very nice, a dev cannot make a mistake easily, but what if I want to Take first, then Skip and then Take again for some reason; ".Take(32).Skip(64).Take(32)"
Will the server return with 'i cannot comply'? Ayende?
// Ryan
Ryan,
If you want to do stuff like that, you make two queries.
@Ayende
Wouldn't ".Take(32).Skip(64).ToArray().Take(32)" also provide the desired result? Although it looks like in this case the last Take is redundant as the Array will only have 32 items to begin with.
Demis,
Yes, it would
Comment preview