Analyzing LightSwitch data access behavior
I thought it would be a good idea to see what sort of data access behavior LightSwitch applications have. So I hook it up with the EntityFramework Profiler and took it for a spin.
It is interesting to note that it seems that every operation that is running is running in the context of a distributed transaction:
There is a time & place to use DTC, but in general, you should avoid them until you really need them. I assume that this is something that is actually being triggered by WCF behavior, not intentional.
Now, let us look at what a simple search looks like:
This search results in:
That sound? Yes, the one that you just heard. That is the sound of a DBA somewhere expiring. The presentation about LightSwitch touted how you can search every field. And you certainly can. You can also swim across the English channel, but I found that taking the train seems to be an easier way to go about doing this.
Doing this sort of searching is going to be:
- Very expensive once you have any reasonable amount of data.
- Prevent usage of indexes to optimize performance.
In other words, this is an extremely brute force approach for this, and it is going to be pretty bad from performance perspective.
Interestingly, it seems that LS is using optimistic concurrency by default.
I wonder why they use the slowest method possible for this, instead of using version numbers.
Now, let see how it handles references. I think that I run into something which is a problem, consider:
Which generates:
This make sense only if you can think of the underlying data model. It certainly seems backward to me.
I fixed that, and created four animals, each as the parent of the other:
Which is nice, except that here is the SQL required to generate this screen:
-- statement #1 SELECT [GroupBy1].[A1] AS [C1] FROM (SELECT COUNT(1) AS [A1] FROM [dbo].[AnimalsSet] AS [Extent1]) AS [GroupBy1] -- statement #2 SELECT TOP ( 45 ) [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent1].[DateOfBirth] AS [DateOfBirth], [Extent1].[Species] AS [Species], [Extent1].[Color] AS [Color], [Extent1].[Pic] AS [Pic], [Extent1].[Animals_Animals] AS [Animals_Animals] FROM (SELECT [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent1].[DateOfBirth] AS [DateOfBirth], [Extent1].[Species] AS [Species], [Extent1].[Color] AS [Color], [Extent1].[Pic] AS [Pic], [Extent1].[Animals_Animals] AS [Animals_Animals], row_number() OVER(ORDER BY [Extent1].[Id] ASC) AS [row_number] FROM [dbo].[AnimalsSet] AS [Extent1]) AS [Extent1] WHERE [Extent1].[row_number] > 0 ORDER BY [Extent1].[Id] ASC -- statement #3 SELECT [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent1].[DateOfBirth] AS [DateOfBirth], [Extent1].[Species] AS [Species], [Extent1].[Color] AS [Color], [Extent1].[Pic] AS [Pic], [Extent1].[Animals_Animals] AS [Animals_Animals] FROM [dbo].[AnimalsSet] AS [Extent1] WHERE 1 = [Extent1].[Id] -- statement #4 SELECT [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent1].[DateOfBirth] AS [DateOfBirth], [Extent1].[Species] AS [Species], [Extent1].[Color] AS [Color], [Extent1].[Pic] AS [Pic], [Extent1].[Animals_Animals] AS [Animals_Animals] FROM [dbo].[AnimalsSet] AS [Extent1] WHERE 2 = [Extent1].[Id] -- statement #5 SELECT [Extent1].[Id] AS [Id], [Extent1].[Name] AS [Name], [Extent1].[DateOfBirth] AS [DateOfBirth], [Extent1].[Species] AS [Species], [Extent1].[Color] AS [Color], [Extent1].[Pic] AS [Pic], [Extent1].[Animals_Animals] AS [Animals_Animals] FROM [dbo].[AnimalsSet] AS [Extent1] WHERE 3 = [Extent1].[Id]
I told you that there is a select n+1 builtin into the product, now didn’t I?
Now, to make things just that much worse, it isn’t actually a Select N+1 that you’ll easily recognize. because this doesn’t happen on a single request. Instead, we have a multi tier Select N+1.
What is actually happening is that in this case, we make the first request to get the data, then we make an additional web request per returned result to get the data about the parent.
And I think that you’ll have to admit that a Parent->>Children association isn’t something that is out of the ordinary. In typical system, where you may have many associations, this “feature” alone is going to slow the system to a crawl.
Comments
Regarding how they search for every field. This is seem to be a restriction of the RIA Services. They just workaround lack of normal search functionality by issuing search for every field.
@Mike,
I don't really care what the limitations are, it is a bad behavior and a perf killer
The very least they could is let the developer decide which entity properties are searchable, with an explanation that the more fields enabled the slower searches will be.
Wow pretty sick: Select N+1 web service calls and multiple Like wildcard searches avoiding db indexes?? This looks like a tool to autogenerate 'bad developer code'.
This looks like its taking a lot of the RAD features from MS Access including the fact that it doesn't perform or scale very well.
I've not read anything that says Performance is a goal of LightSwitch.
I look at it as I look at SQL CE - very useful in certain scenarios, but a long way from the best in other scenarios.
RichB,
Aimed at professional developers, remember?
Beside, in the scenarios I covered, even 100 - 1000 rows will give you unacceptable perf issues, so it isn't like I am talking about ton of data or uncommon scenarios
Really?
RichB,
Look at the keynote, they explicitly state there that it is intended for pro dev as well.
Beside, as I mentioned, that problem is horrible even for non pros.
In that case, they need to sort their "messaging" out.
The biggest concern I have is not that 100 rows will give query times on the order of a second, but that a professional developer will be handed a dog-slow LightSwitch app and told to "sort it out" because Microsoft have told them the on-ramp to full Visual Studio is simple. But in reality, it will need to be totally rewritten.
I do think you specified the navigation names wrong ;) On the 'many' side, you should have 'Parent' and on the '1' side you should have 'Children', as those are fields mapped onto the relationship for that side. So the 'many' side, has a property 'parent' as that's the field mapped onto the relationship.
I admit the dialog is pretty weak, even though they try to make it easy (which thus results in the conclusion they failed)
Ayende,
maybe the whole application was designed to learn developers how to recognize the basic anti-patterns? The query listing made me laugh!
"That is the sound of a DBA somewhere expiring"
The correct way would be to re-build full-text indexes on every add/rename/delete column in RAD tool, right?
You all say the performance in unacceptable when you have no understanding of what is acceptable for a particular Light Switch application.
For a given Light Switch application, whilst waiting for data to be fetched, perhaps a delay of a few seconds is acceptable.
James,
It isn't few seconds.
Let us say a typical app is for registration for boy scouts, you have Scout as a table, with the following references:
Mother -> Parents
Father -> Parents
Tribe -> Tribes
LatestAward -> Awards
That means that displaying a tribe of 45 scouts will cost you 181 queries!
Things are going to get awkward when you are making that many calls over the network, which then have to go over to the database.
"That sound? Yes, the one that you just heard. That is the sound of a DBA somewhere expiring."
That quote made my day!
Thanks for your analysis and feedback, Ayende.
I have a couple explanations on some behaviors you listed:
First, on the DTC issue, LightSwitch doesn't use DTC by default. Instead it is calling ObjectContext.Connection.EnlistTransaction, which the EntityFramework Profiler is just interpretting to be a "distributed transaction". The transaction doesn't become distributed until more than 1 provider enlists in it. Since by default only the SqlConnection will be enlisted in the transaction, a distributed transaction isn't created. However, this allows for people to use distributed transactions if they require one.
The reason the optimistic concurrency is implemented that way is because LightSwtich allows you to import your own table, which may not have a version column.
@Frans Bouma has it correct, the Add New Relationship dialog has the navigation property on the side it is defined. On the "child" side, it has a navigation property named "Parent". On the "parent" side, it has a navigation property named "Children". It makes more sense if you use two separate tables, but has confused me on self-references as well.
The LightSwitch team is aware of the Select N+1 problem and is currently working on addressing this problem. There wasn't enough time to get it into Beta1.
re: DTC, okay.
re: Optimistic Concurrency - that is not a good reason not to do it right when you control the schema A-Z.
re: Self referencing - you might want to consider updating the dialog to make it clearer, then.
re: SELECT N+1, forget about it.
The way that the system is designed you can't escape what you have done by doing optimization.
This is built into the building blocks of the product, and fixing this would require re-structuring how you are doing things in a significant manner.
From my experience with MS products in the past, you aren't going to be able to do that.
In essence, your problem isn't that you used the wrong algorithm, your problem is that you have ignored the Fallacies of Distributed Computing in this regard, as well as with the images, as well as plenty of other places.
I was able to recognize the issue from the _keynote_, without any access to the code, and to be honest, I haven't really dug into the product.
Those sort of problems are not coding issues, they are architectual.
Hello Ayende, thanks for your insights to some issues. I dont know NHibernate, but here I found your recommendations to solve N+1 trouble in it using its api features. Its very good ORM, no doubt.
What do you mean about this snippets? Can something as this solve some use cases for one query per table loading of nested 1:N relations? I tested this od Firebird on large data and it performed quite well, sure having indexed foreign keys as in example:
-- level 3 tables
SELECT * FROM LEVEL3 where LEVEL2_id in
(
SELECT id from LEVEL2 where LEVEL1_id in
(
SELECT id from LEVEL1 where (<<LEVEL1 query expression>>)
)
)
-- level 2 tables
SELECT * from LEVEL2 where LEVEL1_id in
(
SELECT id from LEVEL1 where (<<LEVEL1 query expression>>)
)
-- level 1 tables
SELECT * from LEVEL1 where (<<LEVEL1 query expression>>)
How in fact such queries behave internally at query engine?
Thanks in advance for your reply. I am big fan of declarative approach of LightSwitch, I think that things arent so bad as you suppose, because they can enhance internal behaviour, even at MS SQL level at least, if needed. Sure, it is very hard to solve complex things to be easy to use, but this is developers as you are work, or not? :-)
Petr,
The problem with this is that it works well, until your level 1 query becomes complex.
In addition to that, it isn't only a single path that you need to worry about, it is multiple paths, and often in multiple levels.
For example:
Order.OrderLines
Order.Payments
Payment.Resultions
Payment.TransactionLog
OrderLine.Product
OrderLine.Discounts
OrderLine.Changes
Thanks for reply. Umm OK, sure that db engine can handle relational theory well but doesnt know about object relations, obviously. So I had thinked about quite simple queries to db only, leaving partial results composition for ORM / appserver, hopefully as close to db as possible. As in azure there may be such conditions meet at least for "own" databases linkage and hopefully such beast can somehow try to "learn" even something about data in time for optimizations, even for remote databases data caching. But it is clear that current solution in LS is not very sexy, yet ;-). Would not be fine to throw away such troubles asap and concentrate real developers power to something "unpattternable"? Anyway, thanks for now!
Hmmm
Let us say a typical app is for registration for boy scouts, you have Scout as a table, with the following references:
Mother -> Parents
Father -> Parents
Tribe -> Tribes
LatestAward -> Awards
That means that displaying a tribe of 45 scouts will cost you 181 queries!
Things are going to get awkward when you are making that many calls over the network, which then have to go over to the database.
And this will in fact be a typical usage for LS. And there will be 100 scouts in the troop, and it will run on someone's home computer.
No giant data set, no network slowing to a crawl, just someone getting their job done quickly and easily.
What I get is that you miss the entire point of LS.
The real message here is that the IT department has failed in their mission if you are terrified that your users are going around you and using tools like this.
The real message in your objections is your blindness to your own failures in supporting your (would be) clients.
What I would suggest is get off your high horses. Respond to those in your organizations begging for assistance, in a TIMELY MANNER, without "sticker shock" estimates. Then there will be no market for this kind of solution and your panic will go away.
Sadly, I don't see that happening.
As it happens, I appreciate that you are analyzing the weaknesses. This allows me to understand where to go to fix those weaknesses.
jwc,
Except that even in this scenario, You are going to add MINUTES to the local load time the moment that you start adding images to the Scout
Steve,
Here is a rule, every time that you say Dynamics does this, it is a good reason NOT to do this.
For reference:
ayende.com/.../The-CRM-Horror.aspx
ayende.com/.../Developing-on-Microsoft-CRM.aspx
For that matter, you probably want to look at this here:
ayende.com/.../Evaluating-a-Business-Platform.aspx
Probably, but not from my point of view. No offence, but I don't think that spending what little free time I have on this is going to have good ROI.
It ain't going work, sorry.
You are going to trade off SELECT N+1 with Cartesian products and 10 table joins per each form.
Ayende has seen these comments in priviate e-mail but others haven't. I'm referring to Dynamics ERP products (e.g. AX, NAP, GP) and other ERP products, while Ayende's concern was about CRM. I've not worked on CRM, so I can't speak to their behavior.
The default behavior in LightSwitch beta 2 does not use "cartesian products and 10 table joins per each form." It does not blindly include everything. I think it does what people want--take a look at it in beta 2 and give your feedback.
Steve
((Impossible? Software?) impossible!) Take it easy Steve :-D
Comment preview