Northwind Starter Kit ReviewData Access and the essence of needless work, Part II
This is a review of the Northwind Starter Kit project, this review revision 94815 from Dec 18 2011.
Update: Andrea, the project’s leader has posted a reply to this series of posts.
Yes, this is another repositories are evil if you are using an OR/M post.
That is probably going to cause some reaction, so I am going to back this up with code from this NSK project. Let us talk about repositories, in particular. Let us see what we have here:
Okaay…
Now here are a few problems that I have with this:
- There is no value gained by introducing this abstraction. You aren’t adding any capability what so ever.
- In fact, since all OR/Ms provide an abstraction that isn’t dependent on type, creating IRepository<T> and things like ICustomerRepository is just making things more complicated.
- There are going to be changes in behavior between different repositories implementations that will break your code.
Let us see what we actually have as a result. This is the Entity Framework POCO implementation:
You can probably guess how the rest of it is actually implemented. Yes, we have a LOT of code that is dedicated solely for this sort of forwarding operations.
And then we have the actual implementation of the delete:
Just to remind you, here is the NHibernate implementation of the same function:
Leaving aside the atrocious error handling code, the EF POCO version will do an immediate delete. The NHibernate version will wait for the transaction to be committed.
And don’t worry, I do remember the error handling. This is simply wrong.
And then we have implementations such as this:
This is for the Entity Framework Code First implementation. There is a message here that is coming to me loud and clear. This code wants to be deleted. It is neglected and abused and doesn’t serve any purpose in life except gobble up pieces of valuable disk space that could be filled with the much more valuable result of reading from/dev/random.
More posts in "Northwind Starter Kit Review" series:
- (26 Jan 2012) Conclusion
- (24 Jan 2012) That CQRS thing
- (23 Jan 2012) It is all about the services
- (20 Jan 2012) From start to finishing–tracing a request
- (18 Jan 2012) If you won’t respect the database, there will be pain
- (16 Jan 2012) Refactoring to an actual read model
- (13 Jan 2012) Data Access review thoughts
- (12 Jan 2012) The parents have eaten sour grapes, and the children’s teeth are set on edge
- (11 Jan 2012) Data Access and the essence of needless work, Part II
- (10 Jan 2012) Data Access and the essence of needless work, Part I
Comments
for what it's worth, here's the author's point of biew: http://goo.gl/DQUma
Why not move on to some interesting and constructive content like Macto, instead of endless repetitions of the same old criticisms?
As long as the codebase contains stuff like this (http://ayende.com/blog/Images/Windows-Live-Writer/Northwind-Starter-Kit-Review-Data-Access_4221/image_thumb_7.png) the project is really an embarrassment.
No excuses for releasing obviously unreviewed code as sample code.
I actually have people in my company who do this. I wonder where they got this advice from? Now I know...
I'd be inclined to agree there -- this is beginning to get a bit repetitive.
Ayende, why don't you do a more general post on how you set about reviewing a codebase? Where do you start? What do you look out for? What makes you twitch? What do you consider to be good signs? And so on...
Ok the poor HTML sanitizer destroyed the image URL. I linked to the "error handling code" marked with a red cross.
My colleagues are doing exactly that and also using the repository pattern just like this.
So much hurt.
+1 Jonas, James
@Ayende: Could you explain more in detail why this kind of using repository is a problem?
Why is no value gained by introducing repositories? Why is it more complicated? Why will changes in behavior between different repositories implementations will break my code (For this you have different repositories (ICustomerRepository)?
You returned to explain why repository pattern is evil. Just intresting to know what are you doing when you need in your model to access another aggregate. Do you reference NH from your model? I prefer to leave my model POCOed and wrap DB calls by repository pattern. Sleeping good with it.
It seems sensible to argue abot the IRepository<T> interface. When do we ever use repositories polymorphically? Never in my case. So why awkwardly constrain all repositories to the same interface.
I like James McKay's suggestion. Although that might be good for another Tekpub screencast. "In the mind of Ayende - the inner critic unleashed" My money is waiting for that if you do it.
having a common contract does allow things like decorators to easily plug into the rest of the system. also, you have a nice place to do this such as mapping db exceptions to app exceptions, etc.
I've had 2 scenario's where repositories were valuable. In my previous project, we had a legacy ActiveRecord-style "ORM". Because cascading didn't work well, we used repositories to abstract the cascading logic out of our ASP.NET views (where they previously were).
The second scenario was when starting up a project and switching ORMs/databases up to 4 times.
Also, I believe some people just prefer having the ORM code in one place. This is entirely personal.
I know some people are reading these posts and thinking Ayende is not getting the point of the application. That this is a sample app, showing how to possibly wire together different technologies. As someone who was once just starting to learn and continues to learn how to be a better programmer, having sample apps like NSB out there can be detrimental to my learning. I know Andrea defends NSB by saying that it was only made to help support his classes. @Andrea, if that is the case, please change the description on the home page for MSB. On it you state that it is "intended to be used as a blueprint when designing and implementing a .NET layered application architecture." Nowhere on it do I see mention of it as demo samples for your talks or to be used in conjunction with a book. Before I knew better I would have come across this app and tried to use it wholeheartedly. Leading me to use some very unnecessary abstractions in my code. Not only making my apps more complex, but also harder to write and maintain. I am still unlearning some of these "best practices." I think it would be much better to write several smaller apps that would showcase the different technologies and talk about when and why you would use them instead of trying to throw everything into one app and then having naive beginning programmers come along and see this as a "blueprint" on how to write a proper program. Just my 2 cents.
Just wondering about your NHibernate implementation of the Remove method, why have the function return a bool and just not have it return anything i.e. void, if the transaction fails an exception is going to be thrown from the Remove method anyways.
Ayende if you use objects from ORM dll in your code how do you test this code without using DB. Or do you don't care or do you use db in your tests?
@Amit that's the problem. The IRepository interface forces it to return a bool even though it's meaningless in the NH implementation. Hence Ayende saying: "There are going to be changes in behavior between different repositories implementations that will break your code."
@Karep,
You'd use Dependency Injection... the ORM context would be injected into your service class.
Mark Seeman's Manning Press book on DI is a good read.
I think the point Oren is making is one to make you ask the question "Do you need that abstraction?"
It's a classic problem I've encountered in my own coding. How many times have I written an abstraction for logging? I can't even count.
If I'm building an internal business app using Oracle, what's the likelihood I'm going to ever be converting it to SQL Server? Wait, doesn't nHibernate or Entity Framework handle either of these for us? I just change some mapping code over here, and...
Now if you're writing a product and you want the backend to be switchable and work with a variety of subcomponents, then maybe you do need to build that abstraction. But how often do we really do that? Especially with business apps.
That being said, I still want my code to be testable. If I was using ADO.NET rather than an ORM I'd still abstract out calls to it in a data access layer. So this isn't advocating going all the way back in time, it's just a point that ORMs are your data access layer.
From looking at his Github code for the blog (RaccoonBlog, I believe), Ayende puts his data access code directly in the Controller of the MVC application, something like:
public ActionResult List() { var posts = Session.Query<Post>().ToList(); // other stuff }
which at first I thought was an anathema (coming from a strong WebForms background I consider it the cardinal sin of development to put data access code in an event handler) but thinking of it overnight, the data IS abstracted already; the Session (RavenDB in the case of RaccoonBlog, could be NHibernate or EF or whatever) is for all intents and purposes the "Repository" and performs the same functions.
Realistically, most of us would create a PostRepository class that simply wraps the above call and returns a list, so what are we giving up by cutting out the middle man? A little abstraction but what are the actual real-life chances that we would decide "Hey let's switch our data access layer to use Entity Framework; I'm sure glad we created repository interfaces for everything!"
I think I see Ayende's point, although the NSK uses EF and NHibernate both more to show how you could abstract things, even if it's not practical. Ayende's original point, I believe, is that the repositories don't need to be there because the NH Session or EF Context is, essentially, the repository and all the various ICustomerRepositories and IOrderRepositories are doing is wrapping the call; it's not really serving any purpose other than providing an extra layer of complexity.
"ORMs are your data access layer" Very true! No need to abstract the abstraction.
@Alex your questions are answered in the blogpost.
@Wayne so if we go ahead and cut out the abstraction and use the session/context from our controllers, how can we handle caching things in a consistent way (lookups, catalogs etc)?
Of course Ayende is perfectly right. Once people understand abstraction, they tend to look for abstractions. If by any chance something can be abstracted, you can bet they do it. And if you ask why, you immediately get the answer: because I can exchange the implementation. Good news: such demo projects, pretending to be enterprise, made by so called MVPs, create problems around the globe that will keep us busy for the next 20 years.
I don't get it. Sure the abstraction of ICustomerRepository, IProductRepository etc. is way to much, but what speaks against using a generic repository, injected via IoC? That allows me to have different ways to access my data (db, fake, wcf, xml and so on) and i don't see why it would complicate my code. Instead of using Session.Query().ToList() i simply have to use rep.Query().ToList(). So where is the problem?
+1 Jonas, James
@Fish: the problem is that you're abstracting away features of your ORM that you may -- correction, will -- need sooner or later. Fine-tuned prefetch queries to avoid select n+1 issues, for example. Or transaction handling. Or any kind of triggers when an entity is saved. You're losing a lot, adding risk and friction to your project, and getting nothing in return.
You should only adopt this pattern if you know you are going to need it for different data sources etc. You can always retro-fit it later if necessary. But for the overwhelming majority of cases where it's baked in at the start, it turns out to be completely unnecessary.
@Fish: the problem is that you're abstracting away features of your ORM that you may -- correction, will -- need sooner or later. Fine-tuned prefetch queries to avoid select n+1 issues, for example. Or transaction handling. Or any kind of triggers when an entity is saved. You're losing a lot, adding risk and friction to your project, and getting nothing in return.
You should only adopt this pattern if you know you are going to need it for different data sources etc. You can always retro-fit it later if necessary. But for the overwhelming majority of cases where it's baked in at the start, it turns out to be completely unnecessary.
its also worth pointing out by exposing session/context to your controllers your breaking encapsulation and allowing your controllers to know and depend upon your class structure. this will lead to "train wreck" code and increase coupling through out your code base.
@Fish: The problem is that you don't need the repository since chances are it's just wrapping Session.Query().ToList() and not doing anything else with it. Having a generic IRepository interface to wrap the Session call isn't giving you any benefit but it's adding an extra layer of complexity to the code (albeit probably a simple layer). If you don't need to use a different data source, there's no logical difference between the two.
Remember, abstractions are a good thing, but abstractions everywhere smells of needless complexity.
Scott, you are talking about DAO objects not repositories.
A repository is a generic collection-like object that acts as a database mediator. A repository typically would only have simple methods like T GetById() and IQueryable<T> GetAll() methods.
Scott, you are talking about DAO objects not repositories.
A repository is a generic collection-like object that acts as a database mediator. A repository typically would only have simple methods like T GetById() and IQueryable<T> GetAll() methods.
@Peter, why did you swap out the ORM 4 times? I suspect the repository pattern is responsible. You tried to stuff your ORM of choice into a container and it didn't fit.
@Marco, I would say the opposite is true. The controller is there to orchestrate all these services into an application. What is the difference between 'I know I need an IRepository[of T] to handle this request' vs 'I know I need an ISession to handle this request?'
@Joao, because the repository can hide the internal structure by returning view models, dtos, or whatever you want to call it. if the underlying db/document structure changes, you are only forced to make that change in one place. yours controllers should have only one reason to change and to be forced to change for this reason smells.
/dev/random has more value...rofl
@Dmitry Check up on the repository pattern as defined by Fowler. He clearly states "Conceptually, a Repository encapsulates the set of objects persisted in a data store and the operations performed over them" and "and the mapping code encapsulated by the Repository will carry out the appropriate operations behind the scenes". My repositories also expose simple Add, Delete and Query methods, usually to the benefit or more specific data operations.
@Marco: Correct me if I'm wrong but isn't the only reason for the controller to exist is to essentially map from the model to the view? If you directly reference an ISession in your controller with a LINQ query, then I would argue it's part of the controller's responsibility to decide what needs to be retrieved for display - if you change a WHERE clause in your LINQ code, this IMO is a valid reason for the controller to change, and adding an extra abstraction layer on top of it isn't buying you anything.
What about testing/faking without the repository approach?
@Wayne depends upon which model you are referring to :) i personally do not expose my domain models that far up the stack. changes to a where clause is different than structural changes to an class. i would bet in most real world applications a specific model class is being references in multiple controllers and this is where you will see the change ripple throughout your project. again, for simple projects or a one man team this may not be that much friction, but for enterprise apps i would not dare take this approach
@Alex, why would you test queries in isolation? Executing a LINQ statement may generate inefficient ugly SQL in one ORM but generate much better SQL in a different ORM. So trying to pretend the database is really just an in memory collection will prevent you from making these important optimizations. IMO this is better left to integration tests, and you can just use nhibernate+sqlite for this.
@Marco, are you really using a single 3NF database for the entire enterprise? If so then this is the source of your friction.
My best argument for repositories has always been this: all database accessing code is in one place. If I have something like the (over simplified) following code:
public ActionResult NewestCustomer() { return Session.Customers.OrderBy( x=> x.DateCreated).Take(1); }
I would rather have it hidden behind a CustomerRepository.GetNewestCustomer(); method. The logic for getting the newest customer from the database could change (but you're right, probably not the actual data retrieval method, whether ORM or whatever). This way, any where I need to get this newest cutomer in the code from the database, I can always rely on the repository method, and not remembering how I wrote the query through the ORM... was it OrderByDescending()... I can't remember.
@Joao of course not. but still, not all applications need a denormalized reporting store etc and this is still a design principle that applies else where.
@Scott: of course you don't want to clone complex pieces of data retrieval logic around your project. It should definitively reside in methods. but instead of using full blown repositories, you could use extension methods on your Session (or Session.Customers).
@Omer Mor
I'm not sure I like that - you could end up with so many extension methods that it ends up rather messy, and not exactly cohesive.
@cocowalla, I agree completely. The current project I came into had exactly this problem. You also run into issues with determining the current active context since extension methods are static, which also prohibits your ability to easily inject dependencies for a current unit of work.
@cocowalla @Scott +1
@Scott you state, "would rather have it hidden behind a CustomerRepository.GetNewestCustomer(); method. The logic for getting the newest customer from the database could change (but you're right, probably not the actual data retrieval method, whether ORM or whatever). This way, any where I need to get this newest cutomer in the code from the database, I can always rely on the repository method, and not remembering how I wrote the query through the ORM... was it OrderByDescending()... I can't remember."
What you are describing here is not best served by a repository pattern. The issue of what constitutes "a newest customer" is fundamentally a business rules issue, not one of data access. What you are probably wanting then is something closer to a query object or data access object that is the place where you can codify this rules (for instance does newest just mean the last row created or does it take into account the customer status where they must already be verified as well as a trite example).
I haven't read too much through Oren's blog code yet but it appears his style is to leverage extension methods for this type of work. Either way there's a single place for wrapping this up, which was your primary concern it appears, but it's simply NOT a data layer responsibility.
@Jimmy Final thought: I disagree. If the business rules are solely determined by the state of the data, as is such in my example, then IMO it fits the repository definition. I still think of that single customer as a logical data collection (just because it happens to only have one item has no bearing in this context). If I use the repository to add a new customer, this method still returns the newest customer from the entire logical customer data store.
If I needed further business logic, eg all customers with pending orders and failed payment attempts or something, that would be a good candidate for an additional layer with dependencies on CustomerRepository, OrderRepository and PaymentRepository allowing me to query and aggregate data from those different data sources.
@Jonas H
Ayende promised us Matco - a soup to nuts NHibernate app - THE WAY HE DOES IT - instead of complaining about other people's code, but like so much software, he failed to deliver once he showed a few screenshot. Heck anyone can do that.
He's too busy complaining about other peoples code to complete his own projects.
Ayende,
You're a big supporter of the "I directly use my ORM in my controller because IT IS the Repository" but in this case it's not a repository is just a Facade. When an external component (the ORM in this case) is really important in your code (like an ORM is), you HAVE TO build a facade around it. If tomorrow there is a breaking change in NH or EF, I don't want to browse my whole code to see if I was using a broken feature, or worse : avoid updating and keep working with a 5 years old version.
@bourgarel That is why you encapsulate anything more complicated than a
ISession.Get
orISession.Load
in an extension method. And if you group your extension methods in a logical way (perhaps by the entity they work with, orthe controller they are used by, or something else entirely depending on your project) then you can include only those extension methods you need. This will avoid the clutter onISession
that others are talking about. Your queries are testable and DRY, but you don't lose all the power of your OR/M to a needless abstraction.I think that a generic IRepository<T> associated to a ISpecificRepository is a good base for creating repositories quickly with transactional support, without complexity and we can use them to decouple the UI from the O/RM. An O/Rm is like a Data Access Layer and there are not more differences between an sql select and a linq query on a context. If you use a context (EF or NH) directly in your UI logic (MVC or something else) you are coupling the UI with something similar at an sql query. Everywhere you need to obtain the same data you have to repeat your query (linq or sql does'n matter). Use directly an ORM inside the UI logic sounds quite easy but in a medium life of a real world application can be an issue. For example, how can you swap UI or add a service layer and reuse easily the same group of queries?
@Riccardo that's the eternal debate. What Ayende is saying is that many time this layer is useless because you won't reuse every piece of query. If there is some part of query that you'll reuse then create extension method for your IEnumerable<ModelClass>.
And linq/EF is different than sql because it handles the mapping and the query construction, when you're directly dealing with sql you have to do a layer for abstracting this.
@Remi we can't reduce the interaction with data only as a query problem. When you ask for a customer your app could need specific requirements. Would you like to use an extension method for that? I think that an approach with extension methods add more problems. Where would you like to compose the query with your DAL component (session or dbcontext)? inside an mvc controller action? really? And when you need the same data for something else? are you going to copy the orrendous eager loaded query or whatever you use? I can't believe that. I like to expose an interface/repository to the consumers of data, like UI or services and control and set behind a repository a generic group of capabilities and/or set specific requirements. The approach to interact directly with a model without mediating with a repository sounds like the orrible spaghetti code that many years ago some people (me too for small apps, I have to admit) used to write. ORM are like a pill of logic but inside this pill there is a plain simple data interaction and I want limit and control in what way consumer use it.
@Riccardo,
Can you explain what you can do with the
IRepository
that you cannot do withISession
? If you actually needed to reuse a query for a customer in its entirety exactly as is, then you could encapsulate the whole Linq query as an extension method off ofISession
. Now my call would look likesession.GetCustomerWithSomeImportantAttributes();
instead ofrepository.GetCustomerWithSomeImportantAttributes();
.Reuse of queries like this is pretty rare, though. Most of the time, you want the queries to have slightly different filtering, eager loading, pagination, or sorting. Each view (yes even the same view on different platforms) has different requirements for being displayed. Doing this through a repository either gives you an explosion of one-use-only methods, or parameter-hell on a god method.
@Harry (I've just posted a reply to a similar question in the other thread) Before answer your question, I'd like to see an example of a real world application withou repository. Could you suggest me one existing example?
Anyway, when I used in a medium/large app an ORM directly, I initially appreciate this approach and after a while I hate myself for this decision. I didn't use extension methods because they didn't appear to me easy to mantain and doesn't offer all benefits of a well written repository.
Another point of view.... If I'm alone and I start to write down my peronal website and I have the complete control of the whole stack of layers then I can use one of the orm like EF (code first) directly and without worry about repository. If I start a new project with complex requirements, different teams for the differents layers and tiers, I can't allow developers to use freely ISession in their code because I want be always on the door where they have to pass to obtain data.
This blog is written without a repository. It uses the RavenDB session directly. https://github.com/ayende/RaccoonBlog
As to your experience with using an OR/M directly causing pain, I've actually run in to the exact opposite issue. The times that I've used a repository, I've regretted it later on. It made some things easy that I never actually had to use (i.e., swapping OR/Ms) and made optimizing difficult. For instance, it hid SELECT N+1 issues on a regular basis.
As for wanting to be "on the door where they have to pass to obtain data," this sounds very much like your trying to make sure other devs don't do something stupid. Unfortunately, the only way to stop someone from running with scissors is to not allow them to use the scissors. Try cutting paper like that.
Interesting. I started to have a quick look on it. The first impression is that all my doubts about the opportunity to fill the UI layer (like mvc is) with queries are confirmed and this project sounds like an old fashioned ado.net+ui approach with some automapping features. Plus, there is an "interesting" IMetaWeblog that more than a service api it sounds to play the role of a repository :) with the drawback that if the project grow, this repository will explode
Riccardo, This blog was written with no repository layer, because I don't believe I need it. We have been running on it for a while, including making plenty of changes along the way, with very few problems. IMetaWeblog is an ugly interface, I'll be the first to admit. This is an externally defined interface that is used to interact with Windows Live Writer, so the interface def is outside our control. It is NOT a repository, it is an external endpoint to the system, just like the controllers are.
Ayende, you can say "It is NOT a repository" also with capital letter but this doesn't change the evidence. This is not a repository... it's an ORRIBLE one repository :) A part from that, you are suggesting a way that in a real world application could produce (not for every project but I can see a big risk for medium large project) unmanageable controller actions, many static classes filled with procedural extension methods and all this stuff mixed with "external endpoint" that behave as ugly hybrid repositories. To be honest, the quick look that I have had on your controller actions was like a back in the past with a mixture and a repetitions of data access logic attached to the UI. Nothing personal but maybe it's time that someone start to analyze with a critic approach the architecture that you are suggesting and the risks that are related.
Riccardo, There is already a separation of concerns in the app. The views handle the UI, the controller actions either handle the BL or provide data to the view, that is it. Feel free to do a review of RaccoonBlog, I look forward to reading it.
Ayende, I'd like to make a review. I've just downloaded your source and after a quick look, time permit, I'd like to create a branch with good repositories in order to allay some of the problems that your version contains and a clean separation between your typed sql query orm and the UI. I have my already well done implementation of repositories but I'll try to use the implementation from the NSK project just for point out that the problem is not a percentage discount but how you can optimize the percentage discont without using a tight coupled app with many code repetitions. For example, how you can justify two quite identical methods AddPost, one in a controller action and one in the IMetaWeblog repository with a quite identical implementation but slightly different?
Riccardo, Go ahead, you don't need my permission. You can fork the code and just do that.
@Scott: Doing the way back machine thing here (away due to death in the family) but I really feel your comment needs a reply. You state:
Sorry but demonstrably not. The Repository Pattern has a very straight forward and distinct definition about being a collection like facade over a data source. PERIOD. I am not saying what you are doing is wrong (actually from what little you state it sounds like a very logical organizational form). I am just stating that's categorically NOT a Repository. Or as my zayde would say, "Putting a feather up your butt does not make you a chicken". =)
Now when we decide to group query operations together like this it is often CALLED a repository in the type name but that's more about cluing in people reading the code base as the tern <SomeEnityName>Repository has almost become synonymous with DAL Facade.
"the EF POCO version will do an immediate delete. The NHibernate version will wait for the transaction to be committed."
Maybe it's just how it's worded, but that's not a very fair argument for EF vs NHibernate. Removing the ActivateContext.SaveChanges() call will result in an identical setup to your NHibernate code, and adding a Session.Flush() call will result in an identical setup to the EF code.
Ross, The idea in this statement is to point out differences in the implementation of the abstraction, to show that they aren't the same.
Comment preview