NuGet Perf, Part IV–Modeling the packages
Before we move on to discussing how to implement package search, I wanted to take a bit of time to discuss the we structured the data. In particular, there are a bunch of properties that feel very relational in nature. In particular, these two properties:
- Tags: Ian_Mercer Natural_Language Abodit NLP
- Dependencies: AboditUnits:1.0.4|Autofac.Mef:188.8.131.520|ImpromptuInterface:5.6.2|log4net:1.2.11
In the current version of NuGet, those properties are actually stored as symbol separated strings. The reason for that? In relational databases, if you want to have a collection, you have to have another table, then join to it, then take care of it, and wake up in the middle of the night to take it to a walk. So people go the obvious route and just concatenate strings and hope for the best. Note that in the dependencies case, we have multi level concatenation.
In RavenDB, we have full fledged support for storing complex objects, so the tags above will become:
And what about the dependencies? Those we store in an array of complex objects, like so:
RavenDB allows us to store the model in a way that is easy on the eye ,natural to work with and in general making our lives easier.
Let us say that I wanted to add a feature to NuGet, “show me all the packages that use this package”?
And allow me to brag a little bit?
By the way, just to be sure that everyone has full grasp about what is going on, I am writing this post while on 30,000 feet. The laptop I am using is NOT connected to power, and the data set that I am using is the full NuGet dataset.
Compare the results you get from RavenDB to what you have to do in SQL: Dependencies LIKE ‘%log4net%’
You can kiss your performance goodbye with these sort of queries.