Northwind Starter Kit ReviewIf you won’t respect the database, there will be pain
This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.
The database is usually a pretty important piece in your application, and it likes to remind you that it should be respected. If you don’t take care of that, it will make sure that there will be a lot of pain in your future. Case in point, let us look at this method:
It looks nice, it is certainly something that looks like a business service. So let us dig down and see how it works.
It seems like a nice thing, the code is clear, and beside the bug where you get 100% discount if you buy enough and the dissonance between the comment and the code, fairly clear. And it seems that we have service logic and entity logic, which is always nice.
Except that this piece of code issues the following queries (let us assume a customer with 50 orders).
1 Query to load the customer, line 34 in this code. And now let us look at line 35… what is actually going on here:
Okay, so we have an additional query for loading the customer’s orders. Let us dig deeper.
And for each order, we have another query for loading all of that order’s items. Does it gets worse?
Phew! I was worried here for a second.
But it turns out that we only have a Select N+2 here, where N is the number of orders that a customer has.
What do you want, calculating the discount for the order is complicated, it is supposed to take a lot of time. Of course, the entire thing can be expressed in SQL as:
SELECT SUM((UnitPrice * Quantity) * (1 - Discount) Income FROM OrderItems o WHERE o.OrderID in ( SELECT Id FROM Orders WHERE CustomerId = @CustomerId )
But go ahead and try putting that optimization in. The architecture for the application will actively fight you on that.
More posts in "Northwind Starter Kit Review" series:
- (26 Jan 2012) Conclusion
- (24 Jan 2012) That CQRS thing
- (23 Jan 2012) It is all about the services
- (20 Jan 2012) From start to finishing–tracing a request
- (18 Jan 2012) If you won’t respect the database, there will be pain
- (16 Jan 2012) Refactoring to an actual read model
- (13 Jan 2012) Data Access review thoughts
- (12 Jan 2012) The parents have eaten sour grapes, and the children’s teeth are set on edge
- (11 Jan 2012) Data Access and the essence of needless work, Part II
- (10 Jan 2012) Data Access and the essence of needless work, Part I
Comments
All this makes sense to me, but in an earlier post ("The parents have eaten sour grapes, and the children’s teeth are set on edge"), you criticised the code for having SQL in it. (In fact you literally drew a sad face on the code listing). Would you know recommend that the authors use SQL in their code for this purpose?
John, I think Ayende is arguing against abstracting away the ORM. The SQL query can be easily written as a LINQ query directly against the ORM.
again, that's what that code was aimed for: http://goo.gl/DQUma So, looking at the service's code while it was only intended to support the "unit test demo" is pure waste of time :-)
And... No, the IRepository<T> interface wasn't meant neither to astract NSK from the database nor to make it "O/RM switching enabled", but only to implement a DDD-style repository enforcing business rules by means of code contracts (remember: in DDD, repositories pertain to the domain layer)
Add a calculatediscount method on the customerrepository interface that is able to execute your clean sql using the characteristics of the specific context (ex. executesql with EF) . Could this allay the pain, be more respectuful for the database and for reusing code thanks to repository? Something like that... (could be improved... it's only a proof of concept) (I'm not fighting with this architecture)
public interface ICustomerRepository : IRepository<Customer> { Customer FindById(string id); decimal CalculateTotalIncome(string id); }
// implementation public decimal CalculateTotalIncome(string id) { return ActiveContext.ExecuteStoreQuery<decimal>( "SELECT SUM((UnitPrice * Quantity) * (1 - Discount) Income FROM OrderItems o WHERE o.OrderID in (SELECT Id FROM Orders WHERE CustomerId = @p0)", id).First(); }
@Ricardo - this should not be necessary - that query could be described with any decent O/RM without having to use embedded SQL strings, using LINQ or other strongly-typed method.
Anyway, one conclusion of this post is that somebody should be very careful when using lazy-loading - the code might look nice, but the performance might get very bad in some cases..
@Tudor: I'm not interested in this query or if there is another way to write this query. I'm interested on the debate around architecture and around some suggestive opinions like "The architecture for the application will actively fight you on that" and/or "repositories are the devil" or something like that. I think that filling the UI with all interactions with the model is the devil and this idea come to me first from experience and then from theories.
Sometimes the best way to express queries SQL is with... SQL. I don’t any harm in using hand-written SQL with an ORM if that makes things easier.
This thread was totally derailed by the (irrelevant) use of SQL as an example.
I'm only curious about the reason why the author of this blog is fighting against repositories
@Ayende I'm a fan of BDD and as a side-effect I usually end up with rich domain models whose code looks much like the one you posted. My workflow is usually to just let those things lie until I finish a feature, then start profiling, fix some of the performance critical areas by eagerly loading them (In your example, I would eager load Items and Orders). If eagerly loading them isn't enough, I'd take advantage of the Repository pattern by implementing a Repository that queries through SQL. Now I admit that the framework makes me jump through hoops just to get things optimized but then again, that's only when I have to start optimizing. If you think this isn't the best way frameworks are supposed to use ORMs then how would you suggest frameworks handle this without the logic leaking to the data access layer or losing the way to do BDD/DDD smoothly (meaning deprioritizing the data access layer)
@Riccardo: There is no fight against repositories. The point is, the ORM IS the repository.
Whew am I glad I stuck with ADO.NET instead of NHibernate, LinqToSQL, EF, and all that junk.
My code is simple and clear. I fix bugs in no time flat and move on to more important issues - like adding requested functionality!
+1 for @Josh. I don't see much difference between
ISession
andIRepository<T>
, so why have the needless extra abstraction.@Josh From my point of view the point is: are you happy if the UI apps or everyone else can use the ORM and compose query as they like without control? Or is better if we add an interface, fixed by nature, where we can expose, control and define access to the data?
@Josh IRepository is a generic interface useful (but not required) if you like to expose a basic set of crud operations. Every repository could be extended using an ISpecificRepository interface. In my first comment on this thread I tried to point out this.
My first observation. Generic repository sucks because of this. I'd rather have customerRepository.GetTotalIncomeForCustomer(customerId), where I can do what I need to do in the most efficient way. Although at that point it's likely no longer a repository and is simply a DAL.
Now back to the ORM or repository or whatever... Ok, so we don't have to use SQL. We can use LINQ or some sort of specification abstraction. Now someone please explain to me why I would take this SQL query convert it to a LINQ expression which in turn hopefully converts back into the intended SQL query? And the word hope is specific, maybe nhibernate is better, but with Entity Framework I spend a lot of time hoping.
Now maybe it's because I'm an old fart, and I learned SQL. I know SQL pretty well. SQL is a very expressive language. I can do amazing things with SQL, even tuning the query. With LINQ I feel constrained, I feel like I can only do what that language easily supports, and even then it's not entirely clear and I have to jump through some hoops developing my LINQ query because I can't just paste it into SQL management studio and execute it, profile it, etc. And I can read the SQL when it comes across SQL profiler, it's not a convoluted jumbled mess of crap parameter names nobody can read.
And I question that whole bloody abstraction.
I'm ok with the ORM being used to get an object, update said object and save it.
For queries, however, there are better ways.
The problem is that data access is not easily composable without performance problems. There is no easy solution for this.
I don't understand the comment "the bug where you get 100% discount if you buy enough and the dissonance between the comment and the code." I don't see a bug there, and the comment seems accurate to me. I guess it could be structured a little differently to make more sense, but it doesn't look wrong either.
What am I missing?
@Mufasa
discount = income / 100000
discount = 100000 / 100000 = 100% discount
Sounds like "Visual Studio Achievements" will be a great new source of code for future reviews...
@Riccardo If you are exposing CRUD why do you need a repository? The NSK sample is supposed to be showing DDD, not CRUD.
@Riccardo Why would you ever want to limit yourself like that? If you want to have a common way for people to do things against the database, create extension methods to do common things against the IQueryable.
Now... If you're creating an API for someone else to access your database that they're not allowed to touch, and they shouldn't have complete access, then OK. This doesn't seem to be that.
John, I said that they should use a _query_. There are plenty of ways to construct that query. Having chosen to use an OR/M for the app, they should use the query through the OR/M. In an app where the abstraction is SQL, you are more than welcome to use SQL. My criticism was on the multiple levels of usage being used there, not on the fact that they are using SQL.
@Dylan B -- that can never happen because for 100% discount, the income would have to be 100k, but if that was the case the preceding if statement would ensure the discount is flat 6%.
@Dylan B. if (income > 5000) return 0.06m;
Think for yourself, don't believe in everything Ayende writes.
Well it is very easy to put optimization in. It would take probably 3 minutes. Add method to repository that returns client with orders. Simple Include calls. That's all.
But I think I understand what you mean. After some time you will have 5 methods for loading the customer. One for only customer, other for loading it with addresses, other with children, next with orders and so on and on.
Sorry Oren but writing a harsh review of a big codebase, even criticizing a missing comment or a return true makes sense.
But doing so, while exposing a non-existing bug in a business logic not longer than 10 lines - including openingclosing brackets and comments, just 3 lines of pure code - it's just pure EPIC FAIL.
Cheers.
great post!
when doing the sql thing in above example I would use a join instead of a sub select. Would that not be way faster ? or is that a myth?
I would implement this function as a property of the (partial when generated) customer entity because this functionality is only used there.
pedantic - there's one too many open parenthesis in your SQL query.
@ed: never ever create properties that contain logic and can generate calls to database. Ever.
@Karep: make the word "ever" all uppercase, bold and purple. Than add a couple of exclamation marks. Than bump the font to 120pt. That "ever" is oh-so-important :-)
Glad we agree
@Karep: I would create the property but that would not call the db directly but it wil call a function that does.
@Dylan B
the income like the one you posted, will finish in the "then" part of the if branches, not into the "else"
@Steve Sheldon: about:
"Now back to the ORM or repository or whatever... Ok, so we don't have to use SQL. We can use LINQ or some sort of specification abstraction. Now someone please explain to me why I would take this SQL query convert it to a LINQ expression which in turn hopefully converts back into the intended SQL query? And the word hope is specific, maybe nhibernate is better, but with Entity Framework I spend a lot of time hoping.":
The problem with hardcodding SQL as string in the source code are many: if a developer renames a column or table name, he must go in all the places in the source code where that column is hardcodded and rename it, and no refactoring tool will help it with that. If using an O/RM, he will have to do this in a single place.
Also, if somebody makes an error when typing a column name as a string, the compiler can't detect that, and the error will show up much later (if ever) at runtime..
The query should be described with LINQ - other O/RM used code-generated constants for this long time before LINQ existed (ex.: LLBLGen).
@ed: That doesnt change anything. Properties suggest they don't do anything but just return some fields. Clients can create unoptimal code calling such a property in a loop for example. MS regrets making DateTime.Now a property.
@Karep, a property is also used for display purposes (decorate it with display attributes) and gives you a nice fluent experience. Point is that you don't need another business layer or service for doing discount calculations. But you have too have it in the right context. Like: is this the start of a growing product or is this property one of a few calculation properties which is not used anywhere else?
@Steve : Now maybe it's because I'm an old fart, and I learned SQL
Right on brother!
@Tudor : Oh please, I mean really? Really? The software you work with must be terrible (or the devs terrible) if they are changing tables names. Whatever. But you made an attempt at coming up with a reason, regardless of how ridiculous it is, so you get 1 point for that.
YAGNI if I ever saw it - and I've seen it all over 30+ years. Oh, maybe the once it happened I used my editor's search & replace feature that scanned over several files so I didn't have to look at them - or was it grep that I used? It doesn't matter now, as I only spent about 10 minutes tuning things up. 10 minutes in 30 years.
@Tudor
"The problem with hardcodding SQL as string in the source code are many: if a developer renames a column or table name, he must go in all the places in the source code where that column is hardcodded and rename it, and no refactoring tool will help it with that. If using an O/RM, he will have to do this in a single place."
When do you rename a column or a table? I guess when you discover you have been using a inappropriate name or one with a typing error. Shouldn't you change your name in the domain model to reflect that insight? Or did you do it right in the domain model and forgot about the database?
"Also, if somebody makes an error when typing a column name as a string, the compiler can't detect that, and the error will show up much later (if ever) at runtime.."
Like when you press F5 after developing the code to test it? Or do your collegues never test if their changes are working?
@Frank - it's usually not so simple. In complex projects, a database is used by several applications and each application might have over 20 or 30 projects, managed by several teams, over several years.
Even when manual testing and automated testing is done after each change, it's easy to miss one of the hundred places where a field might be referenced in a large application (in a real world, not in an ideal world where the code is 99% covered by integration tests).
About renaming a field in the database: this might happen very often in a real project: usually the naming conventions are different between domain model and the table fields (ex.: DB: Customers.Country_ID and in C#: Customer.CountryId). Five months after the table is created, one of the 30 developers discovers that a db field was not named according to the naming convention (ex. 'Countryid') and renames it.
What is easier: to go in 5 projects and use the O/RM automated tool to detect the database structure changes and automatically refresh the mapping file, or to do a search and replace in 100 files, taking care not to rename a field with the same name, but from another table.
Of course, there are people who prefer to use only stored procedures, other who prefer to manually build their own DAL and other who use O/RMs - each approach has it's advantages and they will never agree on a "best" approach.
Frank, That assume that you actually have a way to go over every single feature in the application. In many cases, those errors are nasty because they happen in places that aren't showing up first thing when you do F5
Thanks Thudor and Ayende, those are good arguments. ;-)
For me, the problem is that entities are mapped as local classes and not as façades to a remote service, the SQL Server. Latency is not zero, Bandwidth is not infinite, and transport cost is not zero.
In the case of LINQ, entities must be mapped to allow a more expressive use of the IQueryable and Expression<Func<>>.
@Dylan B: Doesn't the income > 5000 block prohibit the possibility of running income / 10000 in the else block?
@Tudor: I'm using ORM as well, but have a very strong background in Sql and was living in a sproc-world until robust ORM was introduced to the masses.
IMHO: One of the reasons ORM was invented was to make objects out of the data and not waste precious time by manually populating DTO's, I don't believe the saying "to prevent changes in some tables/ views". Those changes cannot be tracked anyway just by hitting F5, well not if you're using Microsoft ORM?, don't know about NH?
Before ORM, people was tremendously keen to use sprocs (wonder why now, but we all get more experience and insight over time), exactly as now... Everybody is very focused on using ORM and sometimes overlook the fantastic possibility of just writing plain Sql to query those ORM-generated DTO's :)
Comment preview