Creating a leader board with RavenDB
This question was raised in Twitter, and I thought it was quite interesting. In SQL, you can use the rank() function to generate this value, but if you are working on a large data set and especially if you are sorting, you will probably want to avoid this.
Microsoft has a reference architecture for the leader board problem where they recommend running a separate process to recompute the ranking every few minutes and cite about 20 seconds to run the query on a highly optimized scenario (with 1.6 billion entries in a column store).
RavenDB doesn’t have a rank() function, but that you cannot implement a leader board. Let’s see how we can build one, shall we? We’ll start with looking at the document representing a single game:
You’ll probably have a lot more data in your use case, but that should be sufficient to create the leader board. The first step we need to do is to create an index to aggregate those values into total score for the gamers. Here is what the index looks like:
This is a fairly trivial index, which will allow us to compute the total score of a gamer across all games. You might want to also add score per year / month / week / day, etc. I’m not going to touch that since this is basically the same thing.
RavenDB’s map/reduce indexes will process the data and aggregate it across all games. A new game coming in will not require us to recompute the whole dataset, only the gamer that was changed will be updated, and even so, RavenDB can optimize it even further in many cases and touch only some of the data for that gamer to compute the new total.
However, there is a problem here. How do we generate a leader board here? To find the top 20 gamers is easy:
That is easy enough, for sure. But a leader board has more features that we want to have. For example, if I’m not in the top 20, I might want to see other gamers around my level. How can we do that?
We’ll need to issue a few queries for that. First, we want to find what the gamer actual score is:
And then we will need to get the gamers that are a bit better from the current gamer:
What this does is to ask to get the 4 players that are better than the current gamer. And we can do the same for those that are a bit worse:
Note that we are switching the direction of the filter and the order by direction in both queries. That way, we’ll have a list of ten players that are ranked higher or lower than the current gamer, with the current one strictly in the middle.
I’m ignoring the possibility of multiple gamers with the same score, but you can change the > to >= to take them into account. Whatever this is important or not depends on how you want to structure your leader board.
The final, and most complex, part of the leader board is finding the actual rank of any arbitrary gamer. How can we do that? We don’t want to go through the entire result set to get it. The answer to this question is to use facets, like so:
What this will do is ask RavenDB to provide a count of all the gamers whose score are higher or lower than the current gamer. The output looks like this:
And you can now compute the score of the gamer.
Facets are optimized for these kind of queries and are going to operate over the results of the index, so they are already going to operate over aggregated data. Internally, the data is actually stored in a columnar format, so you can expect very quick replies.
There you have it, all the components required to create a leader board in RavenDB.
Comments
Comment preview