Porting MVC Music Store to RavenPorting the HomeController, the Right Way
As I mentioned, we can solve the GetTopSellingAlbums() problem using map/reduce, but that isn’t really a good way of doing it. The problem with doing that (aside from the scared looks and pained sounds that you get when you mention it) is that it is trying to solve the problem in a relational way. Indeed, the previous solution was an near duplication of how a relational database would process that query. So, what is the doc db approach for solving this issue?
The answer is quit simple, remember that documents are independent, and think about the question. What we are asking is what are the top selling albums. If we add a CountSold property to the album, we would suddenly find it so much easier to handle this problem. This means that we would need to update all the albums that are part of a given order when an order is submitted, but that is acceptable (this exact same operation is commonly done in SQL databases as well).
For now, let us waive how we create the CountSold property and fill it with the right values (I’ll discuss it in my next post), for now, assume that this happened, how can we GetTopSellingAlbums() problem?
Well, that is easy enough. All we need to do is define an index for CountSold.
// AlbumsByCountSold from album in docs.Albums select new { album.CountSold };
With that, we can implement GetTopSellingAlbums like this:
And now it is done, very simple, very efficient and quit elegant, eve if I say so myself.
More posts in "Porting MVC Music Store to Raven" series:
- (31 May 2010) StoreManagerController, part 2
- (29 May 2010) StoreManagerController
- (28 May 2010) Porting the checkout process
- (25 May 2010) StoreController
- (24 May 2010) Advanced Migrations
- (23 May 2010) Migrations
- (22 May 2010) Porting the HomeController, the Right Way
- (21 May 2010) Porting the HomeController, the map/reduce way
- (20 May 2010) Data migration
- (19 May 2010) Setting up the application
- (18 May 2010) The data model
Comments
So much simple that I am still wondering how I couldn't think about it earlier lol
As a side note I want to point out that this can be done with an indexed view automatically and at the cost of only one addition disk seek. The view gets updated at write time, too, but automatically.
Many people do not know that you can index a query like this:
select AlbumID, count_big(*) as Count
from OrderDetails
join Albums on ...
group by AlbumID
That said, SQL Server currently limits us to a single SFJWGO query. Ravens approach is more general.
Tobi,
SFJWGO ?
select from join ...
Ayende, I'd love to see a post that 'drills in' the way you should be thinking about Document DB queries when you're use to relational ones.
A few examples one after each other (common SQL queries?) would rock and home the point in (for me at least!) :)
Andrew,
This is pretty much what this series is about :-)
Very elegant and succinct again Oren.
By comparison, in the Redis Way you would achieve a similar result by using a Sorted Set and incrementing the score (i.e. quantity) of the Album each time it is sold. From there to achieve the above query you could select the desired range and order in a single operation (or 2 if you only store Id's in the sorted set).
Are all custom indexes maintained in Lucene? And do you expose any way to pass a raw Lucene query, i.e. for full-text searches, etc?
Do you handle new custom indexes that are added transparently? i.e. do you populate them with existing data retrospectively?
Demis,
Yes, all indexes are in Lucene, yes, you pass the raw text to Lucene so you have the full power, yes, we handle retoractive indexing.
In this example, I assume an expression is sent and processed in Raven as opposed to processing the query in the client code?
Tim,
This happens entirely on the server, yes
Demis Bellot.
That's not a good thing if you want to display Popular Albums over the last 30 days.
Ayende, I would like to see your approach on providing "TopSellingAlbums" for X period of time.
The current approach is good if you wanted to show the top selling time of all time. But on the MVC Music Store (which i regret to say Ive never downloaded and had a look) assuming it was a real store, you would want to show the top selling albums for the past 30 days.
Would you query the past 30 days of sales and calculate the total for each album and display the top 10 or so?
NC,
If I wanted that, I would have gone with the map/reduce way.
@NC
If I wanted to show popular albums over the last 30 days in Redis I would store an 'Album Order' (i.e. OrderId / AlbumId) in a sorted set using DateTime.Ticks as the score. I can then select the range of all 'Album Orders' for the last 30 days. After that on the client I can just count all the occurrence (i.e. quantity) of AlbumIds giving me the most popular ids. The same 'batch get' call can be fetch all the Album metadata by Ids (So again only 2 operations required).
Incidentally storing the most popular albums for the month would be even simpler as you can just maintain a new SortedSet (using qty as the score) every month in a key like 'urn:Album:Sales:2010/05' so all you would need is a GetAll in descending order.
How is this different from doing the exact same thing in SQL?
And how do you solve all of the problems that there are associated with keeping the same piece of information the same place? Atleast in SQL, most engines have an optimization for col = col + num. In your case you have to pull out all of the album at purchase time and write to it.
If you have a popular item, then you have suddenly added a congestion point that you didnt have before.
I really fail to see how this was actually an improvement in anything, except perhaps writing some very specific code a little bit faster.
@Dennis
Ayende is just showing how you would solve real world problems with Raven. He's not showing how its better in this situation - just that it can be done easily.
I believe this is just hitting an index so should still be referencing the same 'master album' if that was your concern here.
I'm not seeing any congestion point here and don't see how a popular data item could ever be the cause of congestion? I could be wrong but I don't think Ayende is putting locks around data items - I know that Redis doesn't.
We'll it could be a matter of preference but if this was in a high traffic website or web service I don't think you want to be doing a table scan every time. In RDBMS land you could be put this behind a cache to increase perf but then you're delivering stale data (as opposed to real-time) for most of the time. My personal preference is to optimize my writes so my reads are in-expensive. When optimized, this call is effectively a NoOp in many NoSQL solutions.
Dennis,
The point isn't to show "look at how everything is better than SQL", the point is to show how you do stuff with Raven.
Raven supports PATCHing a document (updating a single element, including doing an INC).
My problem is that you are presenting this as "We tried to solve the problem in a relational way", and then you make a completely different implementation that is ALSO possible in a normal relational way and many rational people would choose to do the same in a relation database because it saves on reads.
I really dislike this solution, It require updating Albums (which teoretically can lead to concurency conflicts) and you must have two implementations of same functionality - one from all time seling albums and second for topselingalbums in date range (more code => more bugs)
I'm a little confused by this implementation, mainly because I'm still trying to grasp this without thinking SQL.
So whenever an order is placed, we also update the corresponding album document, incrementing the CountSold property? I know you said that documents are independent of each other, but this is the way I'm reading your solution.
Thanks,
cc
Surely OrderBy should be performed before Take?
Corey,
Yes, that is how it would be done
Keith,
Those are all lazily evaluated, there is no meaning to the order.
That seems like a very odd design decision, and is at odds with how other LINQ providers work. Regardless of where and when the query takes place, the semantic meaning of the sequence of operations should be preserved.
Keith,
There really isn't any other way, considering what you are translating this to.
Comment preview