Ayende @ Rahien

It's a girl

Northwind Starter Kit Review: If you won’t respect the database, there will be pain

This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.

The database is usually a pretty important piece in your application, and it likes to remind you that it should be respected. If you don’t take care of that, it will make sure that there will be a lot of pain in your future. Case in point, let us look at this method:

image

It looks nice, it is certainly something that looks like a business service. So let us dig down and see how it works.

image

It seems like a nice thing, the code is clear, and beside the bug where you get 100% discount if you buy enough and the dissonance between the comment and the code, fairly clear. And it seems that we have service logic and entity logic, which is always nice.

Except that this piece of code issues the following queries (let us assume a customer with 50 orders).

1 Query to load the customer, line 34 in this code. And now let us look at line 35… what is actually going on here:

image

Okay, so we have an additional query for loading the customer’s orders. Let us dig deeper.

image

And for each order, we have another query for loading all of that order’s items. Does it gets worse?

image

Phew! I was worried here for a second.

But it turns out that we only have a Select N+2 here, where N is the number of orders that a customer has.

What do you want, calculating the discount for the order is complicated, it is supposed to take a lot of time. Of course, the entire thing can be expressed in SQL as:

SELECT 
  SUM((UnitPrice * Quantity) * (1 - Discount) Income
FROM OrderItems o
WHERE o.OrderID in (
  SELECT Id FROM Orders
  WHERE CustomerId = @CustomerId
)

But go ahead and try putting that optimization in. The architecture for the application will actively fight you on that.

Comments

John Jeffery
01/18/2012 10:11 AM by
John Jeffery

All this makes sense to me, but in an earlier post ("The parents have eaten sour grapes, and the children’s teeth are set on edge"), you criticised the code for having SQL in it. (In fact you literally drew a sad face on the code listing). Would you know recommend that the authors use SQL in their code for this purpose?

Bart
01/18/2012 10:22 AM by
Bart

John, I think Ayende is arguing against abstracting away the ORM. The SQL query can be easily written as a LINQ query directly against the ORM.

Andrea Saltarello
01/18/2012 10:24 AM by
Andrea Saltarello

again, that's what that code was aimed for: http://goo.gl/DQUma So, looking at the service's code while it was only intended to support the "unit test demo" is pure waste of time :-)

Andrea Saltarello
01/18/2012 10:28 AM by
Andrea Saltarello

And... No, the IRepository interface wasn't meant neither to astract NSK from the database nor to make it "O/RM switching enabled", but only to implement a DDD-style repository enforcing business rules by means of code contracts (remember: in DDD, repositories pertain to the domain layer)

Riccardo
01/18/2012 10:29 AM by
Riccardo

Add a calculatediscount method on the customerrepository interface that is able to execute your clean sql using the characteristics of the specific context (ex. executesql with EF) . Could this allay the pain, be more respectuful for the database and for reusing code thanks to repository? Something like that... (could be improved... it's only a proof of concept) (I'm not fighting with this architecture)

public interface ICustomerRepository : IRepository { Customer FindById(string id); decimal CalculateTotalIncome(string id); }

// implementation public decimal CalculateTotalIncome(string id) { return ActiveContext.ExecuteStoreQuery( "SELECT SUM((UnitPrice * Quantity) * (1 - Discount) Income FROM OrderItems o WHERE o.OrderID in (SELECT Id FROM Orders WHERE CustomerId = @p0)", id).First(); }

Tudor
01/18/2012 10:42 AM by
Tudor

@Ricardo - this should not be necessary - that query could be described with any decent O/RM without having to use embedded SQL strings, using LINQ or other strongly-typed method.

Anyway, one conclusion of this post is that somebody should be very careful when using lazy-loading - the code might look nice, but the performance might get very bad in some cases..

Riccardo
01/18/2012 11:02 AM by
Riccardo

@Tudor: I'm not interested in this query or if there is another way to write this query. I'm interested on the debate around architecture and around some suggestive opinions like "The architecture for the application will actively fight you on that" and/or "repositories are the devil" or something like that. I think that filling the UI with all interactions with the model is the devil and this idea come to me first from experience and then from theories.

Carlos Mendes
01/18/2012 12:38 PM by
Carlos Mendes

Sometimes the best way to express queries SQL is with... SQL. I don’t any harm in using hand-written SQL with an ORM if that makes things easier.

tobi
01/18/2012 12:45 PM by
tobi

This thread was totally derailed by the (irrelevant) use of SQL as an example.

Riccardo
01/18/2012 01:15 PM by
Riccardo

I'm only curious about the reason why the author of this blog is fighting against repositories

Jonn Louie
01/18/2012 01:24 PM by
Jonn Louie

@Ayende I'm a fan of BDD and as a side-effect I usually end up with rich domain models whose code looks much like the one you posted. My workflow is usually to just let those things lie until I finish a feature, then start profiling, fix some of the performance critical areas by eagerly loading them (In your example, I would eager load Items and Orders). If eagerly loading them isn't enough, I'd take advantage of the Repository pattern by implementing a Repository that queries through SQL. Now I admit that the framework makes me jump through hoops just to get things optimized but then again, that's only when I have to start optimizing. If you think this isn't the best way frameworks are supposed to use ORMs then how would you suggest frameworks handle this without the logic leaking to the data access layer or losing the way to do BDD/DDD smoothly (meaning deprioritizing the data access layer)

Josh
01/18/2012 01:48 PM by
Josh

@Riccardo: There is no fight against repositories. The point is, the ORM IS the repository.

Mr_Simple
01/18/2012 02:14 PM by
Mr_Simple

Whew am I glad I stuck with ADO.NET instead of NHibernate, LinqToSQL, EF, and all that junk.

My code is simple and clear. I fix bugs in no time flat and move on to more important issues - like adding requested functionality!

Harry Steinhilber
01/18/2012 02:14 PM by
Harry Steinhilber

+1 for @Josh. I don't see much difference between ISession and IRepository<T>, so why have the needless extra abstraction.

Riccardo
01/18/2012 02:29 PM by
Riccardo

@Josh From my point of view the point is: are you happy if the UI apps or everyone else can use the ORM and compose query as they like without control? Or is better if we add an interface, fixed by nature, where we can expose, control and define access to the data?

@Josh IRepository is a generic interface useful (but not required) if you like to expose a basic set of crud operations. Every repository could be extended using an ISpecificRepository interface. In my first comment on this thread I tried to point out this.

Steve Sheldon
01/18/2012 03:52 PM by
Steve Sheldon

My first observation. Generic repository sucks because of this. I'd rather have customerRepository.GetTotalIncomeForCustomer(customerId), where I can do what I need to do in the most efficient way. Although at that point it's likely no longer a repository and is simply a DAL.

Now back to the ORM or repository or whatever... Ok, so we don't have to use SQL. We can use LINQ or some sort of specification abstraction. Now someone please explain to me why I would take this SQL query convert it to a LINQ expression which in turn hopefully converts back into the intended SQL query? And the word hope is specific, maybe nhibernate is better, but with Entity Framework I spend a lot of time hoping.

Now maybe it's because I'm an old fart, and I learned SQL. I know SQL pretty well. SQL is a very expressive language. I can do amazing things with SQL, even tuning the query. With LINQ I feel constrained, I feel like I can only do what that language easily supports, and even then it's not entirely clear and I have to jump through some hoops developing my LINQ query because I can't just paste it into SQL management studio and execute it, profile it, etc. And I can read the SQL when it comes across SQL profiler, it's not a convoluted jumbled mess of crap parameter names nobody can read.

And I question that whole bloody abstraction.

I'm ok with the ORM being used to get an object, update said object and save it.

For queries, however, there are better ways.

tobi
01/18/2012 04:16 PM by
tobi

The problem is that data access is not easily composable without performance problems. There is no easy solution for this.

Mufasa
01/18/2012 04:18 PM by
Mufasa

I don't understand the comment "the bug where you get 100% discount if you buy enough and the dissonance between the comment and the code." I don't see a bug there, and the comment seems accurate to me. I guess it could be structured a little differently to make more sense, but it doesn't look wrong either.

What am I missing?

Dylan B
01/18/2012 05:23 PM by
Dylan B

@Mufasa

discount = income / 100000

discount = 100000 / 100000 = 100% discount

J Healy
01/18/2012 05:23 PM by
J Healy

Sounds like "Visual Studio Achievements" will be a great new source of code for future reviews...

Ryan
01/18/2012 05:35 PM by
Ryan

@Riccardo If you are exposing CRUD why do you need a repository? The NSK sample is supposed to be showing DDD, not CRUD.

Josh
01/18/2012 05:41 PM by
Josh

@Riccardo Why would you ever want to limit yourself like that? If you want to have a common way for people to do things against the database, create extension methods to do common things against the IQueryable.

Now... If you're creating an API for someone else to access your database that they're not allowed to touch, and they shouldn't have complete access, then OK. This doesn't seem to be that.

Ayende Rahien
01/18/2012 06:18 PM by
Ayende Rahien

John, I said that they should use a query. There are plenty of ways to construct that query. Having chosen to use an OR/M for the app, they should use the query through the OR/M. In an app where the abstraction is SQL, you are more than welcome to use SQL. My criticism was on the multiple levels of usage being used there, not on the fact that they are using SQL.

Sha
01/18/2012 06:38 PM by
Sha

@Dylan B -- that can never happen because for 100% discount, the income would have to be 100k, but if that was the case the preceding if statement would ensure the discount is flat 6%.

ThinkingOne
01/18/2012 06:38 PM by
ThinkingOne

@Dylan B. if (income > 5000) return 0.06m;

Think for yourself, don't believe in everything Ayende writes.

Karep
01/18/2012 06:42 PM by
Karep

Well it is very easy to put optimization in. It would take probably 3 minutes. Add method to repository that returns client with orders. Simple Include calls. That's all.

But I think I understand what you mean. After some time you will have 5 methods for loading the customer. One for only customer, other for loading it with addresses, other with children, next with orders and so on and on.

njy
01/18/2012 07:20 PM by
njy

Sorry Oren but writing a harsh review of a big codebase, even criticizing a missing comment or a return true makes sense.

But doing so, while exposing a non-existing bug in a business logic not longer than 10 lines - including openingclosing brackets and comments, just 3 lines of pure code - it's just pure EPIC FAIL.

Cheers.

ed
01/18/2012 08:05 PM by
ed

great post!

when doing the sql thing in above example I would use a join instead of a sub select. Would that not be way faster ? or is that a myth?

I would implement this function as a property of the (partial when generated) customer entity because this functionality is only used there.

Adam
01/18/2012 08:12 PM by
Adam

pedantic - there's one too many open parenthesis in your SQL query.

Karep
01/18/2012 08:58 PM by
Karep

@ed: never ever create properties that contain logic and can generate calls to database. Ever.

njy
01/18/2012 09:14 PM by
njy

@Karep: make the word "ever" all uppercase, bold and purple. Than add a couple of exclamation marks. Than bump the font to 120pt. That "ever" is oh-so-important :-)

Karep
01/18/2012 10:06 PM by
Karep

Glad we agree

Ed
01/19/2012 08:43 AM by
Ed

@Karep: I would create the property but that would not call the db directly but it wil call a function that does.

Roberto
01/19/2012 08:51 AM by
Roberto

@Dylan B

the income like the one you posted, will finish in the "then" part of the if branches, not into the "else"

Tudor
01/19/2012 02:35 PM by
Tudor

@Steve Sheldon: about:

"Now back to the ORM or repository or whatever... Ok, so we don't have to use SQL. We can use LINQ or some sort of specification abstraction. Now someone please explain to me why I would take this SQL query convert it to a LINQ expression which in turn hopefully converts back into the intended SQL query? And the word hope is specific, maybe nhibernate is better, but with Entity Framework I spend a lot of time hoping.":

The problem with hardcodding SQL as string in the source code are many: if a developer renames a column or table name, he must go in all the places in the source code where that column is hardcodded and rename it, and no refactoring tool will help it with that. If using an O/RM, he will have to do this in a single place.

Also, if somebody makes an error when typing a column name as a string, the compiler can't detect that, and the error will show up much later (if ever) at runtime..

The query should be described with LINQ - other O/RM used code-generated constants for this long time before LINQ existed (ex.: LLBLGen).

Karep
01/19/2012 05:44 PM by
Karep

@ed: That doesnt change anything. Properties suggest they don't do anything but just return some fields. Clients can create unoptimal code calling such a property in a loop for example. MS regrets making DateTime.Now a property.

Ed
01/19/2012 07:50 PM by
Ed

@Karep, a property is also used for display purposes (decorate it with display attributes) and gives you a nice fluent experience. Point is that you don't need another business layer or service for doing discount calculations. But you have too have it in the right context. Like: is this the start of a growing product or is this property one of a few calculation properties which is not used anywhere else?

Bob
01/20/2012 05:14 AM by
Bob

@Steve : Now maybe it's because I'm an old fart, and I learned SQL

Right on brother!

@Tudor : Oh please, I mean really? Really? The software you work with must be terrible (or the devs terrible) if they are changing tables names. Whatever. But you made an attempt at coming up with a reason, regardless of how ridiculous it is, so you get 1 point for that.

YAGNI if I ever saw it - and I've seen it all over 30+ years. Oh, maybe the once it happened I used my editor's search & replace feature that scanned over several files so I didn't have to look at them - or was it grep that I used? It doesn't matter now, as I only spent about 10 minutes tuning things up. 10 minutes in 30 years.

Frank
01/20/2012 06:47 AM by
Frank

@Tudor

"The problem with hardcodding SQL as string in the source code are many: if a developer renames a column or table name, he must go in all the places in the source code where that column is hardcodded and rename it, and no refactoring tool will help it with that. If using an O/RM, he will have to do this in a single place."

When do you rename a column or a table? I guess when you discover you have been using a inappropriate name or one with a typing error. Shouldn't you change your name in the domain model to reflect that insight? Or did you do it right in the domain model and forgot about the database?

"Also, if somebody makes an error when typing a column name as a string, the compiler can't detect that, and the error will show up much later (if ever) at runtime.."

Like when you press F5 after developing the code to test it? Or do your collegues never test if their changes are working?

Tudor
01/20/2012 09:22 AM by
Tudor

@Frank - it's usually not so simple. In complex projects, a database is used by several applications and each application might have over 20 or 30 projects, managed by several teams, over several years.

Even when manual testing and automated testing is done after each change, it's easy to miss one of the hundred places where a field might be referenced in a large application (in a real world, not in an ideal world where the code is 99% covered by integration tests).

About renaming a field in the database: this might happen very often in a real project: usually the naming conventions are different between domain model and the table fields (ex.: DB: Customers.Country_ID and in C#: Customer.CountryId). Five months after the table is created, one of the 30 developers discovers that a db field was not named according to the naming convention (ex. 'Countryid') and renames it.

What is easier: to go in 5 projects and use the O/RM automated tool to detect the database structure changes and automatically refresh the mapping file, or to do a search and replace in 100 files, taking care not to rename a field with the same name, but from another table.

Of course, there are people who prefer to use only stored procedures, other who prefer to manually build their own DAL and other who use O/RMs - each approach has it's advantages and they will never agree on a "best" approach.

Ayende Rahien
01/20/2012 12:47 PM by
Ayende Rahien

Frank, That assume that you actually have a way to go over every single feature in the application. In many cases, those errors are nasty because they happen in places that aren't showing up first thing when you do F5

Frank
01/20/2012 11:31 PM by
Frank

Thanks Thudor and Ayende, those are good arguments. ;-)

Daniel
01/21/2012 12:26 AM by
Daniel

For me, the problem is that entities are mapped as local classes and not as façades to a remote service, the SQL Server. Latency is not zero, Bandwidth is not infinite, and transport cost is not zero.

In the case of LINQ, entities must be mapped to allow a more expressive use of the IQueryable and Expression<Func<>>.

Mufasa
01/23/2012 04:12 PM by
Mufasa

@Dylan B: Doesn't the income > 5000 block prohibit the possibility of running income / 10000 in the else block?

Janus Knudsen
01/24/2012 11:04 PM by
Janus Knudsen

@Tudor: I'm using ORM as well, but have a very strong background in Sql and was living in a sproc-world until robust ORM was introduced to the masses.

IMHO: One of the reasons ORM was invented was to make objects out of the data and not waste precious time by manually populating DTO's, I don't believe the saying "to prevent changes in some tables/ views". Those changes cannot be tracked anyway just by hitting F5, well not if you're using Microsoft ORM?, don't know about NH?

Before ORM, people was tremendously keen to use sprocs (wonder why now, but we all get more experience and insight over time), exactly as now... Everybody is very focused on using ORM and sometimes overlook the fantastic possibility of just writing plain Sql to query those ORM-generated DTO's :)

Comments have been closed on this topic.