Document based modelingAuctions & Bids
In my previous post, we dealt with how to model Auctions and Products, this time, we are going to look at how to model bids.
Before we can do that, we need to figure out how we are going to use them. As I mentioned, I am going to use Ebay as the source for “application mockups”. So I went to Ebay and took a couple of screen shots.
Here is the actual auction page:
And here is the actual bids page.
This tells us several things:
- Bids aren’t really accessed for the main page.
- There is a strong likelihood that the number of bids is going to be small for most items (less than a thousand).
- Even for items with a lot of bids, we only care about the most recent ones for the most part.
This is the Auction document as we have last seen it:
{ "Quantity":15, "Product":{ "Name":"Flying Monkey Doll", "Colors":[ "Blue & Green" ], "Price":29, "Weight":0.23 }, "StartsAt":"2011-09-01", "EndsAt":"2011-09-15" }
The question is where are we putting the Bids? One easy option would be to put all the bids inside the Auction document, like so:
{ "Quantity":15, "Product":{ "Name":"Flying Monkey Doll", "Colors":[ "Blue & Green" ], "Price":29, "Weight":0.23 }, "StartsAt":"2011-09-01", "EndsAt":"2011-09-15", "Bids": [ {"Bidder": "bidders/123", "Amount": 0.1, "At": "2011-09-08T12:20" } ] }
The problem with such an approach is that we are now forced to load the Bids whenever we want to load the Auction, but the main scenario is that we just need the Auction details, not all of the Bids details. In fact, we only need the count of Bids and the Winning Bid, it will also fail to handle properly the scenario of High Interest Auction, one that has a lot of Bids.
That leave us with few options. One of those indicate that we don’t really care about Bids and Auction as a time sensitive matter. As long as we are accepting Bids, we don’t really need to give you immediate feedback. Indeed, this is how most Auction sites work. They give you a cached view of the data, refreshing it every 30 seconds or so. The idea is to reduce the cost of actually accepting a new Bids to the minimum necessary. Once the Auction is closed, we can figure out who actually won and notify them.
A good design for this scenario would be a separate Bid document for each Bid, and a map/reduce index to get the Winning Bid Amount and Big Count. Something like this:
{"Bidder": "bidders/123", "Amount": 0.1, "At": "2011-09-08T12:20", "Auction": "auctions/1234"} {"Bidder": "bidders/234", "Amount": 0.15, "At": "2011-09-08T12:21", "Auction": "auctions/1234" } {"Bidder": "bidders/123", "Amount": 0.2, "At": "2011-09-08T12:22", "Auction": "auctions/1234" }
And the index:
from bids in docs.Bids select new { Count = 1, bid.Amount, big.Auction } select result from results group result by result.Auction into g select new { Count = g.Sum(x=>x.Count), Amount = g.Max(x=>x.Amount), Auction = g.Key }
As you can imagine, due to the nature of RavenDB’s indexes, we can cheaply insert new Bids, without having to wait for the indexing to work. And we can always display the last calculated value of the Auction, including what time it is stable for.
That is one model for an Auction site, but another one would be a much stringer scenario, where you can’t just accept any Bid. It might be a system where you are charged per bid, so accepting a known invalid bid is not allowed (if you were outbid in the meantime). How would we build such a system? We can still use the previous design, and just defer the actual billing for a later stage, but let us assume that this is a strong constraint on the system.
In this case, we can’t rely on the indexes, because we need immediately consistent information, and we need it to be cheap. With RavenDB, we have the document store, which is ACIDly consistent. So we can do the following, store all of the Bids for an Auction in a single document:
{ "Auction": "auctions/1234", "Bids": [ {"Bidder": "bidders/123", "Amount": 0.1, "At": "2011-09-08T12:20", "Auction": "auctions/1234"} {"Bidder": "bidders/234", "Amount": 0.15, "At": "2011-09-08T12:21", "Auction": "auctions/1234" } {"Bidder": "bidders/123", "Amount": 0.2, "At": "2011-09-08T12:22", "Auction": "auctions/1234" } ] }
And we modify the Auction document to be:
{ "Quantity":15, "Product":{ "Name":"Flying Monkey Doll", "Colors":[ "Blue & Green" ], "Price":29, "Weight":0.23 }, "StartsAt":"2011-09-01", "EndsAt":"2011-09-15", "WinningBidAmount": 0.2, "BidsCount" 3 }
Adding the BidsCount and WinningBidAmount to the Auction means that we can very cheaply show them to the users. Because RavenDB is transactional, we can actually do it like this:
using(var session = store.OpenSession()) { session.Advanced.OptimisticConcurrency = true; var auction = session.Load<Auction>("auctions/1234") var bids = session.Load<Bids>("auctions/1234/bids"); bids.AddNewBid(bidder, amount); auction.UpdateStatsFrom(bids); session.SaveChanges(); }
We are now guaranteed that this will either succeed completely (and we have a new winning bid), or it will fail utterly, leaving no trace. Note that AddNewBid will reject a bid that isn’t the higher (throw an exception), and if we have two concurrent modifications, RavenDB will throw on that. Both the Auction and its Bids are treated as a single transactional unit, just the way it should.
The final question is how to handle High Interest Auction, one that gather a lot of bids. We didn’t worry about it in the previous model, because that was left for RavenDB to handle. In this case, since we are using a single document for the Bids, we need to take care of that ourselves. There are a few things that we need to consider here:
- Bids that lost are usually of little interest.
- We probably need to keep them around, just in case, nevertheless.
Therefor, we will implement splitting for the Bids document. What does this means?
Whenever the number of Bids in the Bids document reaches 500 Bids, we split the document. We take the oldest 250 Bids and move them to Historical Bids document, and then we save.
That way, we have a set of historical documents with 250 Bids each that no one is ever likely to read, but we need to keep, and we have the main Bids document, which contains the most recent (and relevant Bids. A High Interest Auction might end up looking like:
- auctions/1234 <- Auction document
- auctions/1234/bids <- Bids document
- auctions/1234/bids/1 <- historical bids #1
- auctions/1234/bids/2 <- historical bids #2
And that is enough for now I think, this post went on a little longer than I intended, but hopefully I was able to explain to you both the final design decisions and the process used to reach them.
Thoughts?
More posts in "Document based modeling" series:
- (30 Sep 2011) Auctions & Bids
- (06 Sep 2011) Auctions
Comments
How would you handle a bid that needs feedback sooner than 30 seconds? In the scenario of Auto Bidding where the current bid is say $100, but the user has an Auto Bid to a maximum value of $180.
If the incremented bids are $5 for example, and a new user bids $105, the new user needs immediate feedback that their bid was outbid by an autobid, and the new highest bid is $110.
I don't know about ebay because i rarely use it, since it's crap. But http://www.trademe.co.nz/ has the concept of auto-bidding.
The traffic would be a fraction of what is on ebay, but regardless, how would you handle such scenario?
I assume it would be the same as your second scenario.
Regarding the performance choices, I thing all that you save on not including the bids in the auction document is probably lost when you have to use a transaction to update two documents on each bid. Maybe you could choose a third approach - keep last N most important bids in the auction and all older bids in separate documents? This way you would be updating only the auction document on each bid and every N/2 bids you would throw away the oldest bids into a new 'bids' document.
Ayende,
"The problem with such an approach is that we are now forced to load the Bids whenever we want to load the Auction, but the main scenario is that we just need the Auction details, not all of the Bids details."
Can't we use live projections here?
What about paging bids for high interest auctions. that's basically what bid history is, a page of bids, but what is gained by placing them into a separate historical bids document?
I do not like to do it explicitely: auction.UpdateStatsFrom(bids); I mean it is completely related to this: bids.AddNewBid(bidder, amount); Is there a way to update "WinningBidAmount" and "BidsCount" automatically? or maybe to use an Auction model that contains only "Quantity", "Product", "StartsAt" and "EndsAt" and to get a compounded model with "WinningBidAmount" and "BidsCount" from an index?
A little off the subject, but not really since we are talking about data modeling, but what ever happened to Matco?
It's been 45 days (8-17-11) since the last Matco post and I was really enjoying the series.
Thanks Ayende
Why would you create a separate document for history bids? can't you use paging to get only the top X of the bids?
Phillip, Auto bidding scenario would be handled by the auction. As part of the transaction that saves the new bid, all auto bids would fire and new bids would be added.
Rafal, There is very little cost of updating two documents in the same transaction vs. just one
Chanan, Yes, we can, but I like to work with the model directly for most things. The main problem is also what would happen if you had 5,000 bids? It would make the Auction document very large
Jason, You split them into a mode that allows very easy paging, and zero work on any part of the system.
Andres, Yes, you can do that with an index,
Nadav, Yes, you can use paging for this, but since you rarely if ever need old bids, it is easier to just shove them out of the way
An error occurred on our server, error details: Future post error: the post is already published. Post Id: Hiring Questions–The phone book–responding to commentary, PublishAt: 10/3/2011 12:00:00 PM +03:00, Now: 10/3/2011 11:42:13 AM +02:00
System.InvalidOperationException: Future post error: the post is already published. Post Id: Hiring Questions–The phone book–responding to commentary, PublishAt: 10/3/2011 12:00:00 PM +03:00, Now: 10/3/2011 11:42:13 AM +02:00 at RaccoonBlog.Web.ViewModels.FuturePostViewModel.DistanceOfTimeInWords(Double minutes) in C:\Work\RaccoonBlog\src\RaccoonBlog.Web\ViewModels\FuturePostViewModel.cs:line 32 at RaccoonBlog.Web.ViewModels.FuturePostViewModel.get_Time() in C:\Work\RaccoonBlog\src\RaccoonBlog.Web\ViewModels\FuturePostViewModel.cs:line 25 at ASP._Page_Views_Section_FuturePosts_cshtml.Execute() in c:\Sites\ayende.com\blog\Views\Section\FuturePosts.cshtml:line 11 at System.Web.WebPages.WebPageBase.ExecutePageHierarchy() at System.Web.Mvc.WebViewPage.ExecutePageHierarchy() at System.Web.WebPages.StartPage.ExecutePageHierarchy() at System.Web.WebPages.WebPageBase.ExecutePageHierarchy(WebPageContext pageContext, TextWriter writer, WebPageRenderingBase startPage) at System.Web.Mvc.ViewResultBase.ExecuteResult(ControllerContext context) at System.Web.Mvc.ControllerActionInvoker.<>c__DisplayClass1c.<InvokeActionResultWithFilters>b__19() at System.Web.Mvc.ControllerActionInvoker.InvokeActionResultFilter(IResultFilter filter, ResultExecutingContext preContext, Func
1 continuation) at System.Web.Mvc.ControllerActionInvoker.InvokeActionResultFilter(IResultFilter filter, ResultExecutingContext preContext, Func
1 continuation) at System.Web.Mvc.ControllerActionInvoker.InvokeActionResultFilter(IResultFilter filter, ResultExecutingContext preContext, Func1 continuation) at System.Web.Mvc.ControllerActionInvoker.InvokeActionResultWithFilters(ControllerContext controllerContext, IList
1 filters, ActionResult actionResult) at System.Web.Mvc.ControllerActionInvoker.InvokeAction(ControllerContext controllerContext, String actionName)Error, Yes, we know. We got caught in Winter Clock problem, because the server timzone changed. Will be fixed in a few minutes
Future post error: the post is already published. Post Id: Hiring Questions-The phone book-responding to commentary, PublishAt: 10/3/2011 12:00:00 PM +03:00, Now: 10/3/2011 11:53:44 AM +02:00
System.InvalidOperationException: Future post error: the post is already published. Post Id: Hiring Questions-The phone book-responding to commentary, PublishAt: 10/3/2011 12:00:00 PM +03:00, Now: 10/3/2011 11:53:44 AM +02:00 at RaccoonBlog.Web.ViewModels.FuturePostViewModel.DistanceOfTimeInWords(Double minutes) in C:\Work\RaccoonBlog\src\RaccoonBlog.Web\ViewModels\FuturePostViewModel.cs:line 32 at RaccoonBlog.Web.ViewModels.FuturePostViewModel.get_Time() in C:\Work\RaccoonBlog\src\RaccoonBlog.Web\ViewModels\FuturePostViewModel.cs:line 25 at ASP._Page_Views_Section_FuturePosts_cshtml.Execute() in c:\Sites\ayende.com\blog\Views\Section\FuturePosts.cshtml:line 11 at System.Web.WebPages.WebPageBase.ExecutePageHierarchy() at System.Web.Mvc.WebViewPage.ExecutePageHierarchy() at System.Web.WebPages.StartPage.ExecutePageHierarchy() at System.Web.WebPages.WebPageBase.ExecutePageHierarchy(WebPageContext pageContext, TextWriter writer, WebPageRenderingBase startPage) at System.Web.Mvc.ViewResultBase.ExecuteResult(ControllerContext context) at System.Web.Mvc.ControllerActionInvoker.<>c__DisplayClass1c.<InvokeActionResultWithFilters>b__19() at System.Web.Mvc.ControllerActionInvoker.InvokeActionResultFilter(IResultFilter filter, ResultExecutingContext preContext, Func
1 continuation) at System.Web.Mvc.ControllerActionInvoker.InvokeActionResultFilter(IResultFilter filter, ResultExecutingContext preContext, Func
1 continuation) at System.Web.Mvc.ControllerActionInvoker.InvokeActionResultFilter(IResultFilter filter, ResultExecutingContext preContext, Func1 continuation) at System.Web.Mvc.ControllerActionInvoker.InvokeActionResultWithFilters(ControllerContext controllerContext, IList
1 filters, ActionResult actionResult) at System.Web.Mvc.ControllerActionInvoker.If you modelled this in an object based system stored in SQL (for example) would it be any different? I would see 3 classs (Product, Auction, and Bids) that would eventualy be stored in 3 dbs with a relationship between Bids -> Auction. So it makes sense to me to separate out bids (I agree that updating two documents instead of one is a better strategy vs. loading all the bids into an auction).
Even if you do lazy loading, you're probably going to want the aggregated value in the Auction class because you don't want to be doing a query to find out the bidscount.
I'm curious if the object model would be that much different in C# for SQL storage vs. a document store. Would a NoSQL store be any different (or is a document store considered NoSQL).
Bil, For the Aggregates, yes, it is the same. The real saving is in the things that you can embed, like the product itself, comments, buying information, etc.
This article has cleared a lot of things up for me.. two questions, though:
Given that the domain has essentially been de-normalized, how would the developer (or ravendb, for that matter) handle deleting an aggregate or flagging it as no longer available? Would the delete have to cascade across all document affected, or would a flag have to be included in each document that references it (thus avoiding the N+1 problem)?
This is probably OT. This morning, I had an "ah ha!" moment regarding DDD and contextual security. Basically, if you have aggregate roots, you don't have to worry about contextual security. However, in a system where, say, a user has more than one role, I am not sure you could cohesively represent multiple aggregates in different ways from the same root. (This question might be better answered in another format, but I was just thinking about it today).
Bobby, I don't think that the model has been significantly denormalized. In particular, the product copy was done to make sure that it is immutable. So changes wouldn't affect it. In such a system, you don't delete a product. Also, see this: http://www.udidahan.com/2009/09/01/dont-delete-just-dont/
I don't understand the question about roles, and it is probably better suited for the mailing list
Great Example! I'm currently trying to wrap my head around this exact concept.
One question I see in the example you retrieved Auction 1234 and its bids. How would you go about Retrieving Auction 1234 and its "bidders"? Would it be as simple to say var bidders = session.Load<Bids>("auctions/1234/bidders"); although this seems more realistic var bidders = session.Load<Bids>("auctions/1234/bids/bidders"); //Not sure how to make it distinct?
I'm COMPLETELY new to Raven and NoSQL so forgive me if it seems like a dumb question
Mauricia, Why would I want the bidders? In general, tertiary associations aren't needed in most document models, because they exists on either the primary or secondary documents
Comment preview