Ayende @ Rahien

Refunds available at head office

NuGet Perf, Part III–Displaying the Packages page

The first thing that we will do with RavenDB and the NuGet data is to issue the same logical query as the one used to populate the packages page. As a reminder, here is how it looks:

SELECT        TOP (30) 
          -- ton of fields removed for brevity
FROM        (

            SELECT        Filtered.Id
                    ,    Filtered.PackageRegistrationKey
                    ,    Filtered.Version
                    ,    Filtered.DownloadCount
                    ,    row_number() OVER (ORDER BY Filtered.DownloadCount DESC, Filtered.Id ASC) AS [row_number]
            FROM        (
                        SELECT        PackageRegistrations.Id
                                ,    Packages.PackageRegistrationKey
                                ,    Packages.Version
                                ,    PackageRegistrations.DownloadCount
                        FROM        Packages
                        INNER JOIN    PackageRegistrations ON PackageRegistrations.[Key] = Packages.PackageRegistrationKey
                        WHERE        Packages.IsPrerelease <> cast(1 as bit)
                        ) Filtered
            ) Paged
INNER JOIN    PackageRegistrations ON PackageRegistrations.[Key] = Paged.PackageRegistrationKey
INNER JOIN    Packages ON Packages.PackageRegistrationKey = Paged.PackageRegistrationKey AND Packages.Version = Paged.Version
WHERE        Paged.[row_number] > 30
ORDER BY    PackageRegistrations.DownloadCount DESC
        ,    Paged.Id

Despite the apparent complexity ,this is a really trivial query. What is does is say:

  • Give me the first 30 – 60 rows
  • Where IsPrerelease is false
  • Order by the download count and then the id

With Linq, the client side query looks something like this:

var results = Session.Query<Package>()
                         .Where(x=>x.IsPrerelease == false)
                         .OrderBy(x=>x.DownloadCount).ThenBy(x=>x.Id)
                         .Skip(30)
                         .Take(30)
                         .ToList();

Now, I assume that this is what the NuGet code is also doing, it is just that the relational database has made it so they have to go to the data in a really convoluted way.

With RavenDB, to match the same query, I could just issue the following query, but there are subtle differences between how the query works in SQL and how it works in RavenDB. in particular, the data that we have in RavenDB is the output of this query, but it isn’t the raw output. For example, we don’t have the Id column available, which is used for sorting. Now, I think that the logic is meaning to say, “sort by download count descending and then by age ascending”. So old and popular packages are more visible than new and fresh packages.

In order to match the same behavior (and because we will need it to the next post) we will define the following index in RavenDB:

image

And querying it:

image

The really nice thing about this?

image

This is the URL for this search:

/indexes/Packages/Listing?query=IsPrerelease:false&start=0&pageSize=128&aggregation=None&sort=-DownloadCount&sort=Created

This is something that RavenDB can do in its sleep, because it is a very cheap operation. Consider the query plan that would for the SQL query above. You have to join 5 times just to get to the data that you want, paging is a real mess, and the database actually have to work a lot to answer this fiddling little query.

Just to give you some idea here. We are talking about something that conceptually should be the same as:

select top 30 skip 30 * from Data where IsPrerelease = 0

But it get really complex really fast with the joins and the tables and all the rest.

In comparison, in RavenDB, we actually do have just a property match to do. Because we keep the entire object graph in a single location, we can do very efficient searches on it.

In the next post, I’ll discuss the actual way I modeled the data, and then we get to do exciting searches Smile.

Comments

Frans Bouma
08/30/2012 09:33 AM by
Frans Bouma

Your linq query is definitely not the source of the SQL you're seeing. It joins several tables twice, pages over a subset and joins that subset. Your linq query would not result in this.

The cumbersome way the SQL looks is a result of linq though: normally one would move the isprerelease predicate in the where clause inside the ON clause and simply page over the end result of the query. I don't see why they do it this way. Your linq query would result (normally) in a query which looks like the 'filtered' subset, and move the order by inside the query. After all paging in SQL Server might look cumbersome, but it's a wrapper query you apply to the normal query, where you wrap your normal query with the 'paging wrapper' to get paging. As they don't do that here, it's overly complicated.

I don't know the datamodel of NuGet, but from the looks of it it looks like they used more than 1 table for package storing. One truly wonders why. But then again, it's NuGet, some service which web developers think is 'useful' because they find it useful, forgetting that not everyone does webdevelopment

Simon Skov Boisen
08/30/2012 09:46 AM by
Simon Skov Boisen

Frans do you mean to say that NuGet is only useful for people doing webdevelopment? In that case I think you don't know about the breath of different packages is available from NuGet, one of the more popular once is Ninject an IOC container, NUnit a testing framework and log4net a loggin library - not really specific to web-development.

Frans Bouma
08/30/2012 11:08 AM by
Frans Bouma

@simon No I'm saying that nuget is primary a solution to a problem webdevs had, but non-web devs didn't have. I mean, a lot of devs simply create a folder in their solution, add 3rd party dlls there and reference them in multiple projects from that folder. the 'recent' tab in add-reference is then more handy than nuget to add references to multiple projects.

jonnii
08/30/2012 11:19 AM by
jonnii

Frans, I agree with you. I even go one step further, I don't use 3rd party dlls, i just copy and paste the code off github and codeplex for all the libraries I use into my project. Don't have to worry about all these extra dlls anymore, and I only have to include the classes I need! It doesn't matter if the dll versions are compatible with each other because I get new ones every build!

Matt Warren
08/30/2012 11:23 AM by
Matt Warren

Why is the download count exactly the same for all the jQuery versions?

Ayende Rahien
08/30/2012 11:51 AM by
Ayende Rahien

Frans, We use nuget in pretty much any project we have now, and we don't do web apps much if at all. We like to get away from having to manage the deps and nuget does a good job at it.

Ayende Rahien
08/30/2012 11:57 AM by
Ayende Rahien

Matt, That is the value we get from NuGet OData, see:

https://nuget.org/api/v2/Packages?$skiptoken='jQuery','0.0.0.0'

As you can see, you have DownloadCount which is the same for all.

What I think I missed is that there is also VersionDownloadCount, with the value just for this version, not globally.

Kat
08/30/2012 12:03 PM by
Kat

While your results are clearly good, you're in no way comparing apples with apples.

The reason the SQL Server version is slow is because the schema is a stinking mess of lots of tables, not because SQL Server is bad and RavenDB is good.

A simple denormalised persisted view along with full-text search would definitely give good results.

Tim Murphy
08/30/2012 01:32 PM by
Tim Murphy

Any chance of providing performance data for SQL Server on same or similar machine?

Ayende Rahien
08/30/2012 01:37 PM by
Ayende Rahien

Tim, I don't have the data in SQL format.

Andreas Kroll
08/30/2012 01:50 PM by
Andreas Kroll

Frans and jonnii,

sometimes I cannot believe what I read. You really think it is easier to copy dlls to a directory or even copy code from GitHub to your project than perform an "install-package "??? NuGet really is getting better and better each day. Most packages integrate themselves into solutions very well, so for instance I have IoC ready with one or two install-package commands depending on which container I use. What about dependencies? NuGet pulls all dependencies automatically for me. You would have to do that by hand. What about updates in your case? I just issue an update command for an updated package and get all the benefits of version checking etc.

What is it you dislike about NuGet? I imagine if you'd work on a linux machine you would also not use a package installer like yast to get tools, but rather install them by hand or even download the code and compile it?

dotnetchris
08/30/2012 01:51 PM by
dotnetchris

This post is get summation of every reason I love RavenDB with modern software development. This post shows everything that is wrong for doing modern software development against relational dbs and how large of an impedance mismatch SQL tables have compared to object graphs.

dotnetchris
08/30/2012 01:53 PM by
dotnetchris

This post is a great summation*** if i could type.

Beyers
08/30/2012 02:01 PM by
Beyers

@Frans, @jonnii, I cannot disagree more with you. To me, Nuget is to package and dependency management, as what version control is to source code. Not to mention Nuget private repositories where you can host your own or 3rd party libraries and have a central point to manage and import from.

But hey, feel free to manually copy DLLs around, create ZIP files of project versions, save them to floppy for backups :)

Simon
08/30/2012 02:10 PM by
Simon

Surely jonnii is just kidding??

Christopher Wright
08/30/2012 03:02 PM by
Christopher Wright

Holding out for the Linux port of the nuget client.

Ali Kheyrollahi
08/30/2012 04:37 PM by
Ali Kheyrollahi

@Frans "I mean, a lot of devs simply create a folder in their solution, add 3rd party dlls there and reference them in multiple projects"

World of development is different now, OSS with fast development cycles needs painless upgrades. If you have not seen it yet, you might have missed the train - I am afraid.

Flavio
08/30/2012 04:55 PM by
Flavio

I hope @jonnii is just kidding...

Karep
08/30/2012 07:17 PM by
Karep

Frans, jonni: What a nonsense. It's easier to analyze source code to chose classes you need then run one command? And how you update that copied code?

Frans Bouma
08/31/2012 09:14 AM by
Frans Bouma

@Ali What are you talking about? So your project simply takes dependencies on the latest dlls from nuget and if something breaks along the way, because an updated version breaks your code, so what? Not every project can use solely OSS dlls (heck, many projects use only non-OSS dlls), and many projects take a dependency on dll vX.Y and stick with that, because they know it works. Upgrade it 'because nuget says so' is stupid. But hey, I'm not your client, so go ahead. But please don't talk to me like I'm a petty child who doesn't know what software dev looks like. I didn't miss a train, why would I? I'm a professional software developer now for over 18 years, do you really think what's hip and 'new' today is actually 'new' ? haha :D

What I find funny is that if you say you like installed versions over some web-based package site, you suddenly do software dev on a dos box with floppy disks. Like I hit your mother in the face with a baseball bat when I talked about NuGet. Get a life.

jeroenh
08/31/2012 09:24 AM by
jeroenh

ouch, someone seems to be in a bad mood or something.

@Frans nuget doesn't force you to upgrade anything, it's a specific action. Also, I concur with Ayende (and the most of the rest of the world) that nuget is useful for just about any project, not just web development.

Frans Bouma
08/31/2012 09:24 AM by
Frans Bouma

@Andreas I didn't say one should choose downloading source over a package install. I just don't see the point of nuget over simply referencing a dll you have on disk. Perhaps it's related to ppl who just do OSS work, but many dlls are closed-source. Try to mix two ways of adding references, it gets cumbersome. Add reference's recent tab is much quicker in that regard. Sure it checks dependencies, but as I said, dependencies of a dll you reference are dependencies you have to research up front anyway. At least for professional projects you're shipping to clients: after all your code then depends on these versions as well. If these dlls update, do you then have to update the dll you directly reference? Most likely yes. Can your project do that? that's to be seen. I wouldn't update referenced dlls 'on the spot' just because there's a new version. At least not in professional projects shipped to clients/customers.

But perhaps in 'modern day' development one doesn't give a f*ck about whether stuff breaks.

I'm not saying nuget doesn't serve a purpose, I just don't see the benefit in my day-to-day work and therefore not the hype around it. But apparently it's forbidden to say so, as it's equal to being stupid.

Ayende Rahien
08/31/2012 09:27 AM by
Ayende Rahien

Frans, We deliver commercial software via nuget. It simplify the update process, and most importantly, the dependencies process for both us and our clients.

Ali Kheyrollahi
08/31/2012 12:18 PM by
Ali Kheyrollahi

@Frans I do not have more to say - not sure what I can say. All I can say is that I respect you for what you have done with LLBLGen Pro.

jonnii
08/31/2012 02:39 PM by
jonnii

@Frans, where I work all of our internal libraries are packaged. TeamCity has a built in nuget server, and if you don't use team city then you can put them on a share drive for everyone to consume.

Karep
08/31/2012 06:48 PM by
Karep

Something I don't understand here. You are running one query on database and are proud it takes 17ms. But NuGet's database is not hit by one user, but tousands of users. Users that also write to that database. So there is locking happening. Clearly I am misunderstanding why you present those 17ms.

Ayende Rahien
09/01/2012 10:42 AM by
Ayende Rahien

Karep, a) RavenDB doesn't DO locking. Users can write to the DB all day, it doesn't impact read performance. b) RavenDB is actually getting faster the more your use it, because it anticipate and optimize itself based on real world usage.

João P. Bragança
09/01/2012 07:07 PM by
João P. Bragança

@Frans,

Pinning a package at a specific nuget version is not that complicated. Install-Package MyPackage -Version x.x.x.x

Almost every other language out there has package management, dunno why .net should be the exception.

Comments have been closed on this topic.