Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,128 | Comments: 45,550

filter by tags archive

NuGet Perf, Part VIII: Correcting a mistake and doing aggregations

time to read 4 min | 610 words

I hope this is the last one, because I can never recall what is the next Latin number.

At any rate, it has been pointed out to me that I made an error in importing the data. I assumed that the DownloadCount field that I got from the Nuget API is the download count for the specific package, but it appears that this is the total downloads count, across all versions of this package. The actual download number for a specific package is: VersionDownloadCount.

That changes things a bit, because the way Nuget sorts things is based on the total download count, not the download count for a specific version. The reason this complicate things is that we aren’t going to store the total download count in all the version documents. First, let us see the sort of query we need to write. In SQL, it would look like this:

select top 30 skip 30 
    (select sum(VersionDownloadCount) from Packages all where all.PackageId = p.PackageId) as TotalDownloadsCount
from Packages p
where IsPrerelease = 0
order by TotalDownloadsCount desc, Created

This is a much simplified version of the real query, and something that you can’t actually write this simply in SQL, most probably. But it gets the point.

Note that in order to process this query, the RDMBS would have to first aggregate all of the data (for each row, mind) then do the paging, then give you the results. Sure, you can keep a counter for all the downloads for a package, but considering the fact that downloads are highly parallel and happen all the time, waiting for writers to finish doing their update.

Instead, with RavenDB, we are going to use a map/reduce index and query on that.


This should be fairly simple to follow. In the map we go over all the packages, and output their package id, whatever they have been released, the specific version download count and the date it was created.

In the reduce, we group by the package id and whatever is was pre released or not ( I am assuming that we usually don’t want to show the pre-release stuff there).

Finally, we sum up all of the individual package downloads and we output the oldest created date. Using all of that, we can now move to the next step, and actually query that:


There  is a small bug here, since I don’t see RavenDB in the results,  but I guess I’ll have to wait until I get the updated data from Nuget.

Actually, that is not quite true, for pre-released software, we are pretty high up:


That explains much, RavenDB 1.2 is pretty awesome.


Paul Stovell

The real question is when will RavenDB 1.2 become 'stable'? Or is the Duke Nukem Forever version of RavenDB? :)

Andreas Kroll

Hi Ayende,

a lot of people will for sure agree that they'll happily help you count in roman numbers if you continue this interesting series of posts we can indeed learn a lot from.

So as grega_g already posted:


But you also could look at http://www.novaroma.org/via_romana/numbers.html which explains the numbers and has a handy converter on the right side :-)

Thanks for the entertaining and informative content so far

Chris Eldredge

When you query the NuGet feed, each result contains the DownloadCount aggregated across all package versions. For example, this query:


How would you combine the map/reduce query with a search query to accomplish this same goal?

Ayende Rahien

Chris, Wait for it, I have it in a future post.

Ayende Rahien

Paul, We have been actively working on 1.2, you can get it right now. It hasn't even been 6 months, I don't think that the comparison is appropriate.

Paul Stovell

@Ayende, sorry, no offence intended, I know it's available on the pre-release channels. I'm just excited for it to come to the stable channel so I can start using the features.

While it has the 'unstable' or 'pre-release' tags, I'm hesitant to switch to it in case it causes my customer's computers to explode and I get blamed for using something clearly labelled 'unstable' (even though I know it's far more stable than most software out there).

Alexei K

Hey Ayende, any chance you can tag you series of posts with a per-series tag? Like tagging this series as "nuget-perf" or something. Like now most posts just have "raven" as tag... that is so very useless for filtering. I want to see the list of posts in this series, and I can't really do that without manually scrolling through the recent post list.

I would love to be able to just click "nuget-perf" tag and get all the articles for easy reading.

Ayende Rahien

Alexei, That is a great idea, I'll do so.

Comment preview

Comments have been closed on this topic.


  1. The worker pattern - about one day from now

There are posts all the way to May 30, 2016


  1. The design of RavenDB 4.0 (14):
    26 May 2016 - The client side
  2. RavenDB 3.5 whirl wind tour (14):
    25 May 2016 - Got anything to declare, ya smuggler?
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats