Ayende @ Rahien

It's a girl

Porting MVC Music Store to Raven: Porting the HomeController, the Right Way

As I mentioned, we can solve the GetTopSellingAlbums() problem using map/reduce, but that isn’t really a good way of doing it. The problem with doing that (aside from the scared looks and pained sounds that you get when you mention it) is that it is trying to solve the problem in a relational way. Indeed, the previous solution was an near duplication of how a relational database would process that query. So, what is the doc db approach for solving this issue?

The answer is quit simple, remember that documents are independent, and think about the question. What we are asking is what are the top selling albums. If we add a CountSold property to the album, we would suddenly find it so much easier to handle this problem. This means that we would need to update all the albums that are part of a given order when an order is submitted, but that is acceptable (this exact same operation is commonly done in SQL databases as well).

For now, let us waive how we create the CountSold property and fill it with the right values (I’ll discuss it in my next post), for now, assume that this happened, how can we GetTopSellingAlbums() problem?

Well, that is easy enough. All we need to do is define an index for CountSold.

// AlbumsByCountSold
from album in docs.Albums
select new { album.CountSold };

With that, we can implement GetTopSellingAlbums like this:

image

And now it is done, very simple, very efficient and quit elegant, eve if I say so myself.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Roland
05/22/2010 11:24 AM by
Roland

So much simple that I am still wondering how I couldn't think about it earlier lol

tobi
05/22/2010 12:20 PM by
tobi

As a side note I want to point out that this can be done with an indexed view automatically and at the cost of only one addition disk seek. The view gets updated at write time, too, but automatically.

Many people do not know that you can index a query like this:

select AlbumID, count_big(*) as Count

from OrderDetails

join Albums on ...

group by AlbumID

That said, SQL Server currently limits us to a single SFJWGO query. Ravens approach is more general.

tobi
05/22/2010 12:37 PM by
tobi

select from join ...

Andrew
05/22/2010 03:19 PM by
Andrew

Ayende, I'd love to see a post that 'drills in' the way you should be thinking about Document DB queries when you're use to relational ones.

A few examples one after each other (common SQL queries?) would rock and home the point in (for me at least!) :)

Ayende Rahien
05/22/2010 04:26 PM by
Ayende Rahien

Andrew,

This is pretty much what this series is about :-)

Demis Bellot
05/22/2010 05:52 PM by
Demis Bellot

Very elegant and succinct again Oren.

By comparison, in the Redis Way you would achieve a similar result by using a Sorted Set and incrementing the score (i.e. quantity) of the Album each time it is sold. From there to achieve the above query you could select the desired range and order in a single operation (or 2 if you only store Id's in the sorted set).

Are all custom indexes maintained in Lucene? And do you expose any way to pass a raw Lucene query, i.e. for full-text searches, etc?

Do you handle new custom indexes that are added transparently? i.e. do you populate them with existing data retrospectively?

Ayende Rahien
05/22/2010 06:16 PM by
Ayende Rahien

Demis,

Yes, all indexes are in Lucene, yes, you pass the raw text to Lucene so you have the full power, yes, we handle retoractive indexing.

Tim
05/22/2010 06:38 PM by
Tim

In this example, I assume an expression is sent and processed in Raven as opposed to processing the query in the client code?

Ayende Rahien
05/22/2010 06:56 PM by
Ayende Rahien

Tim,

This happens entirely on the server, yes

NC
05/23/2010 04:00 AM by
NC

Demis Bellot.

That's not a good thing if you want to display Popular Albums over the last 30 days.

Ayende, I would like to see your approach on providing "TopSellingAlbums" for X period of time.

The current approach is good if you wanted to show the top selling time of all time. But on the MVC Music Store (which i regret to say Ive never downloaded and had a look) assuming it was a real store, you would want to show the top selling albums for the past 30 days.

Would you query the past 30 days of sales and calculate the total for each album and display the top 10 or so?

Ayende Rahien
05/23/2010 07:43 AM by
Ayende Rahien

NC,

If I wanted that, I would have gone with the map/reduce way.

Demis Bellot
05/23/2010 09:19 AM by
Demis Bellot

@NC

If I wanted to show popular albums over the last 30 days in Redis I would store an 'Album Order' (i.e. OrderId / AlbumId) in a sorted set using DateTime.Ticks as the score. I can then select the range of all 'Album Orders' for the last 30 days. After that on the client I can just count all the occurrence (i.e. quantity) of AlbumIds giving me the most popular ids. The same 'batch get' call can be fetch all the Album metadata by Ids (So again only 2 operations required).

Incidentally storing the most popular albums for the month would be even simpler as you can just maintain a new SortedSet (using qty as the score) every month in a key like 'urn:Album:Sales:2010/05' so all you would need is a GetAll in descending order.

Dennis
05/23/2010 01:51 PM by
Dennis

How is this different from doing the exact same thing in SQL?

And how do you solve all of the problems that there are associated with keeping the same piece of information the same place? Atleast in SQL, most engines have an optimization for col = col + num. In your case you have to pull out all of the album at purchase time and write to it.

If you have a popular item, then you have suddenly added a congestion point that you didnt have before.

I really fail to see how this was actually an improvement in anything, except perhaps writing some very specific code a little bit faster.

Demis Bellot
05/23/2010 09:29 PM by
Demis Bellot

@Dennis

Ayende is just showing how you would solve real world problems with Raven. He's not showing how its better in this situation - just that it can be done easily.

And how do you solve all of the problems that there are associated with keeping the same piece of information the same place?

I believe this is just hitting an index so should still be referencing the same 'master album' if that was your concern here.

If you have a popular item, then you have suddenly added a congestion point that you didnt have before.

I'm not seeing any congestion point here and don't see how a popular data item could ever be the cause of congestion? I could be wrong but I don't think Ayende is putting locks around data items - I know that Redis doesn't.

I really fail to see how this was actually an improvement in anything, except perhaps writing some very specific code a little bit faster.

We'll it could be a matter of preference but if this was in a high traffic website or web service I don't think you want to be doing a table scan every time. In RDBMS land you could be put this behind a cache to increase perf but then you're delivering stale data (as opposed to real-time) for most of the time. My personal preference is to optimize my writes so my reads are in-expensive. When optimized, this call is effectively a NoOp in many NoSQL solutions.

Ayende Rahien
05/23/2010 10:36 PM by
Ayende Rahien

Dennis,

The point isn't to show "look at how everything is better than SQL", the point is to show how you do stuff with Raven.

Raven supports PATCHing a document (updating a single element, including doing an INC).

Dennis
05/24/2010 07:03 AM by
Dennis

My problem is that you are presenting this as "We tried to solve the problem in a relational way", and then you make a completely different implementation that is ALSO possible in a normal relational way and many rational people would choose to do the same in a relation database because it saves on reads.

Petr
05/24/2010 07:34 AM by
Petr

I really dislike this solution, It require updating Albums (which teoretically can lead to concurency conflicts) and you must have two implementations of same functionality - one from all time seling albums and second for topselingalbums in date range (more code => more bugs)

Corey Coogan
05/24/2010 02:30 PM by
Corey Coogan

I'm a little confused by this implementation, mainly because I'm still trying to grasp this without thinking SQL.

So whenever an order is placed, we also update the corresponding album document, incrementing the CountSold property? I know you said that documents are independent of each other, but this is the way I'm reading your solution.

Thanks,

cc

Keith
05/24/2010 05:58 PM by
Keith

Surely OrderBy should be performed before Take?

Ayende Rahien
05/25/2010 06:34 PM by
Ayende Rahien

Corey,

Yes, that is how it would be done

Ayende Rahien
05/25/2010 06:34 PM by
Ayende Rahien

Keith,

Those are all lazily evaluated, there is no meaning to the order.

Keith
05/25/2010 07:14 PM by
Keith

That seems like a very odd design decision, and is at odds with how other LINQ providers work. Regardless of where and when the query takes place, the semantic meaning of the sequence of operations should be preserved.

Ayende Rahien
05/25/2010 07:20 PM by
Ayende Rahien

Keith,

There really isn't any other way, considering what you are translating this to.

Comments have been closed on this topic.