NuGet Perf, Part III–Displaying the Packages page

time to read 8 min | 1455 words

The first thing that we will do with RavenDB and the NuGet data is to issue the same logical query as the one used to populate the packages page. As a reminder, here is how it looks:

SELECT        TOP (30) 
          -- ton of fields removed for brevity
FROM        (

            SELECT        Filtered.Id
                    ,    Filtered.PackageRegistrationKey
                    ,    Filtered.Version
                    ,    Filtered.DownloadCount
                    ,    row_number() OVER (ORDER BY Filtered.DownloadCount DESC, Filtered.Id ASC) AS [row_number]
            FROM        (
                        SELECT        PackageRegistrations.Id
                                ,    Packages.PackageRegistrationKey
                                ,    Packages.Version
                                ,    PackageRegistrations.DownloadCount
                        FROM        Packages
                        INNER JOIN    PackageRegistrations ON PackageRegistrations.[Key] = Packages.PackageRegistrationKey
                        WHERE        Packages.IsPrerelease <> cast(1 as bit)
                        ) Filtered
            ) Paged
INNER JOIN    PackageRegistrations ON PackageRegistrations.[Key] = Paged.PackageRegistrationKey
INNER JOIN    Packages ON Packages.PackageRegistrationKey = Paged.PackageRegistrationKey AND Packages.Version = Paged.Version
WHERE        Paged.[row_number] > 30
ORDER BY    PackageRegistrations.DownloadCount DESC
        ,    Paged.Id

Despite the apparent complexity ,this is a really trivial query. What is does is say:

Give me the first 30 – 60 rows
Where IsPrerelease is false
Order by the download count and then the id

With Linq, the client side query looks something like this:

var results = Session.Query<Package>()
                         .Where(x=>x.IsPrerelease == false)
                         .OrderBy(x=>x.DownloadCount).ThenBy(x=>x.Id)
                         .Skip(30)
                         .Take(30)
                         .ToList();

Now, I assume that this is what the NuGet code is also doing, it is just that the relational database has made it so they have to go to the data in a really convoluted way.

With RavenDB, to match the same query, I could just issue the following query, but there are subtle differences between how the query works in SQL and how it works in RavenDB. in particular, the data that we have in RavenDB is the output of this query, but it isn’t the raw output. For example, we don’t have the Id column available, which is used for sorting. Now, I think that the logic is meaning to say, “sort by download count descending and then by age ascending”. So old and popular packages are more visible than new and fresh packages.

In order to match the same behavior (and because we will need it to the next post) we will define the following index in RavenDB:

And querying it:

The really nice thing about this?

This is the URL for this search:

/indexes/Packages/Listing?query=IsPrerelease:false&start=0&pageSize=128&aggregation=None&sort=-DownloadCount&sort=Created

This is something that RavenDB can do in its sleep, because it is a very cheap operation. Consider the query plan that would for the SQL query above. You have to join 5 times just to get to the data that you want, paging is a real mess, and the database actually have to work a lot to answer this fiddling little query.

Just to give you some idea here. We are talking about something that conceptually should be the same as:

select top 30 skip 30 * from Data where IsPrerelease = 0

But it get really complex really fast with the joins and the tables and all the rest.

In comparison, in RavenDB, we actually do have just a property match to do. Because we keep the entire object graph in a single location, we can do very efficient searches on it.

In the next post, I’ll discuss the actual way I modeled the data, and then we get to do exciting searches Smile .

Tweet Share Share 29 comments

Tags:

Comments

30 Aug 2012
09:33 AM

Frans Bouma

Your linq query is definitely not the source of the SQL you're seeing. It joins several tables twice, pages over a subset and joins that subset. Your linq query would not result in this.

The cumbersome way the SQL looks is a result of linq though: normally one would move the isprerelease predicate in the where clause inside the ON clause and simply page over the end result of the query. I don't see why they do it this way. Your linq query would result (normally) in a query which looks like the 'filtered' subset, and move the order by inside the query. After all paging in SQL Server might look cumbersome, but it's a wrapper query you apply to the normal query, where you wrap your normal query with the 'paging wrapper' to get paging. As they don't do that here, it's overly complicated.

I don't know the datamodel of NuGet, but from the looks of it it looks like they used more than 1 table for package storing. One truly wonders why. But then again, it's NuGet, some service which web developers think is 'useful' because they find it useful, forgetting that not everyone does webdevelopment

30 Aug 2012
09:46 AM

Simon Skov Boisen

Frans do you mean to say that NuGet is only useful for people doing webdevelopment? In that case I think you don't know about the breath of different packages is available from NuGet, one of the more popular once is Ninject an IOC container, NUnit a testing framework and log4net a loggin library - not really specific to web-development.

30 Aug 2012
11:08 AM

Frans Bouma

@simon No I'm saying that nuget is primary a solution to a problem webdevs had, but non-web devs didn't have. I mean, a lot of devs simply create a folder in their solution, add 3rd party dlls there and reference them in multiple projects from that folder. the 'recent' tab in add-reference is then more handy than nuget to add references to multiple projects.

30 Aug 2012
11:19 AM

jonnii

Frans, I agree with you. I even go one step further, I don't use 3rd party dlls, i just copy and paste the code off github and codeplex for all the libraries I use into my project. Don't have to worry about all these extra dlls anymore, and I only have to include the classes I need! It doesn't matter if the dll versions are compatible with each other because I get new ones every build!

30 Aug 2012
11:23 AM

Matt Warren

Why is the download count exactly the same for all the jQuery versions?

30 Aug 2012
11:51 AM

Ayende Rahien

Frans, We use nuget in pretty much any project we have now, and we don't do web apps much if at all. We like to get away from having to manage the deps and nuget does a good job at it.

30 Aug 2012
11:57 AM

Ayende Rahien

Matt, That is the value we get from NuGet OData, see:

https://nuget.org/api/v2/Packages?$skiptoken='jQuery','0.0.0.0'

As you can see, you have DownloadCount which is the same for all.

What I think I missed is that there is also _VersionDownloadCount_, with the value just for this version, not globally.

30 Aug 2012
12:03 PM

Kat

While your results are clearly good, you're in no way comparing apples with apples.

The reason the SQL Server version is slow is because the schema is a stinking mess of lots of tables, not because SQL Server is bad and RavenDB is good.

A simple denormalised persisted view along with full-text search would definitely give good results.

30 Aug 2012
13:32 PM

Tim Murphy

Any chance of providing performance data for SQL Server on same or similar machine?

30 Aug 2012
13:37 PM

Ayende Rahien

Tim, I don't have the data in SQL format.

30 Aug 2012
13:50 PM

Andreas Kroll

Frans and jonnii,

sometimes I cannot believe what I read. You really think it is easier to copy dlls to a directory or even copy code from GitHub to your project than perform an "install-package <name>"??? NuGet really is getting better and better each day. Most packages integrate themselves into solutions very well, so for instance I have IoC ready with one or two install-package commands depending on which container I use. What about dependencies? NuGet pulls all dependencies automatically for me. You would have to do that by hand. What about updates in your case? I just issue an update command for an updated package and get all the benefits of version checking etc.

What is it you dislike about NuGet? I imagine if you'd work on a linux machine you would also not use a package installer like yast to get tools, but rather install them by hand or even download the code and compile it?

30 Aug 2012
13:51 PM

dotnetchris

This post is get summation of every reason I love RavenDB with modern software development. This post shows everything that is wrong for doing modern software development against relational dbs and how large of an impedance mismatch SQL tables have compared to object graphs.

30 Aug 2012
13:53 PM

dotnetchris

This post is a great summation*** if i could type.

30 Aug 2012
14:01 PM

Beyers

@Frans, @jonnii, I cannot disagree more with you. To me, Nuget is to package and dependency management, as what version control is to source code. Not to mention Nuget private repositories where you can host your own or 3rd party libraries and have a central point to manage and import from.

But hey, feel free to manually copy DLLs around, create ZIP files of project versions, save them to floppy for backups :)

30 Aug 2012
14:10 PM

Simon

Surely jonnii is just kidding??

30 Aug 2012
15:02 PM

Christopher Wright

Holding out for the Linux port of the nuget client.

30 Aug 2012
16:37 PM

Ali Kheyrollahi

@Frans "I mean, a lot of devs simply create a folder in their solution, add 3rd party dlls there and reference them in multiple projects"

World of development is different now, OSS with fast development cycles needs painless upgrades. If you have not seen it yet, you might have missed the train - I am afraid.

30 Aug 2012
16:55 PM

Flavio

I hope @jonnii is just kidding...

30 Aug 2012
19:17 PM

Karep

Frans, jonni: What a nonsense. It's easier to analyze source code to chose classes you need then run one command? And how you update that copied code?

31 Aug 2012
09:14 AM

Frans Bouma

@Ali What are you talking about? So your project simply takes dependencies on the latest dlls from nuget and if something breaks along the way, because an updated version breaks your code, so what? Not every project can use solely OSS dlls (heck, many projects use only non-OSS dlls), and many projects take a dependency on dll vX.Y and stick with that, because they know it works. Upgrade it 'because nuget says so' is stupid. But hey, I'm not your client, so go ahead. But please don't talk to me like I'm a petty child who doesn't know what software dev looks like. I didn't miss a train, why would I? I'm a professional software developer now for over 18 years, do you really think what's hip and 'new' today is actually 'new' ? haha :D

What I find funny is that if you say you like installed versions over some web-based package site, you suddenly do software dev on a dos box with floppy disks. Like I hit your mother in the face with a baseball bat when I talked about NuGet. Get a life.

31 Aug 2012
09:24 AM

jeroenh

ouch, someone seems to be in a bad mood or something.

@Frans nuget doesn't force you to upgrade anything, it's a specific action. Also, I concur with Ayende (and the most of the rest of the world) that nuget is useful for just about any project, not just web development.

31 Aug 2012
09:24 AM

Frans Bouma

@Andreas I didn't say one should choose downloading source over a package install. I just don't see the point of nuget over simply referencing a dll you have on disk. Perhaps it's related to ppl who just do OSS work, but many dlls are closed-source. Try to mix two ways of adding references, it gets cumbersome. Add reference's recent tab is much quicker in that regard. Sure it checks dependencies, but as I said, dependencies of a dll you reference are dependencies you have to research up front anyway. At least for professional projects you're shipping to clients: after all your code then depends on these versions as well. If these dlls update, do you then have to update the dll you directly reference? Most likely yes. Can your project do that? that's to be seen. I wouldn't update referenced dlls 'on the spot' just because there's a new version. At least not in professional projects shipped to clients/customers.

But perhaps in 'modern day' development one doesn't give a f*ck about whether stuff breaks.

I'm not saying nuget doesn't serve a purpose, I just don't see the benefit in my day-to-day work and therefore not the hype around it. But apparently it's forbidden to say so, as it's equal to being stupid.

31 Aug 2012
09:27 AM

Ayende Rahien

Frans, We deliver commercial software via nuget. It simplify the update process, and most importantly, the dependencies process for both us and our clients.

31 Aug 2012
12:18 PM

Ali Kheyrollahi

@Frans I do not have more to say - not sure what I can say. All I can say is that I respect you for what you have done with LLBLGen Pro.

31 Aug 2012
14:39 PM

jonnii

@Frans, where I work all of our internal libraries are packaged. TeamCity has a built in nuget server, and if you don't use team city then you can put them on a share drive for everyone to consume.

31 Aug 2012
18:48 PM

Karep

Something I don't understand here. You are running one query on database and are proud it takes 17ms. But NuGet's database is not hit by one user, but tousands of users. Users that also write to that database. So there is locking happening. Clearly I am misunderstanding why you present those 17ms.

01 Sep 2012
10:42 AM

Ayende Rahien

Karep, a) RavenDB doesn't DO locking. Users can write to the DB all day, it doesn't impact read performance. b) RavenDB is actually getting faster the more your use it, because it anticipate and optimize itself based on real world usage.

01 Sep 2012
19:07 PM

João P. Bragança

@Frans,

Pinning a package at a specific nuget version is not that complicated. Install-Package MyPackage -Version x.x.x.x

Almost every other language out there has package management, dunno why .net should be the exception.

27 Sep 2012
21:03 PM

jimlowe11

very nice

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB