Ayende @ Rahien

Refunds available at head office

Analyzing LightSwitch data access behavior

I thought it would be a good idea to see what sort of data access behavior LightSwitch applications have. So I hook it up with the EntityFramework Profiler and took it for a spin.

It is interesting to note that it seems that every operation that is running is running in the context of a distributed transaction:

image

There is a time & place to use DTC, but in general, you should avoid them until you really need them. I assume that this is something that is actually being triggered by WCF behavior, not intentional.

Now, let us look at what a simple search looks like:

image

This search results in:

image

That sound? Yes, the one that you just heard. That is the sound of a DBA somewhere expiring. The presentation about LightSwitch touted how you can search every field. And you certainly can. You can also swim across the English channel, but I found that taking the train seems to be an easier way to go about doing this.

Doing this sort of searching is going to be:

  • Very expensive once you have any reasonable amount of data.
  • Prevent usage of indexes to optimize performance.

In other words, this is an extremely brute force approach for this, and it is going to be pretty bad from performance perspective.

Interestingly, it seems that LS is using optimistic concurrency by default.

image

I wonder why they use the slowest method possible for this, instead of using version numbers.

Now, let see how it handles references. I think that I run into something which is a problem, consider:

image

Which generates:

image

This make sense only if you can think of the underlying data model. It certainly seems backward to me.

I fixed that, and created four animals, each as the parent of the other:

image

Which is nice, except that here is the SQL required to generate this screen:

-- statement #1
SELECT [GroupBy1].[A1] AS [C1]
FROM   (SELECT COUNT(1) AS [A1]
        FROM   [dbo].[AnimalsSet] AS [Extent1]) AS [GroupBy1]

-- statement #2
SELECT   TOP ( 45 ) [Extent1].[Id]              AS [Id],
                    [Extent1].[Name]            AS [Name],
                    [Extent1].[DateOfBirth]     AS [DateOfBirth],
                    [Extent1].[Species]         AS [Species],
                    [Extent1].[Color]           AS [Color],
                    [Extent1].[Pic]             AS [Pic],
                    [Extent1].[Animals_Animals] AS [Animals_Animals]
FROM     (SELECT [Extent1].[Id]                      AS [Id],
                 [Extent1].[Name]                    AS [Name],
                 [Extent1].[DateOfBirth]             AS [DateOfBirth],
                 [Extent1].[Species]                 AS [Species],
                 [Extent1].[Color]                   AS [Color],
                 [Extent1].[Pic]                     AS [Pic],
                 [Extent1].[Animals_Animals]         AS [Animals_Animals],
                 row_number()
                   OVER(ORDER BY [Extent1].[Id] ASC) AS [row_number]
          FROM   [dbo].[AnimalsSet] AS [Extent1]) AS [Extent1]
WHERE    [Extent1].[row_number] > 0
ORDER BY [Extent1].[Id] ASC

-- statement #3
SELECT [Extent1].[Id]              AS [Id],
       [Extent1].[Name]            AS [Name],
       [Extent1].[DateOfBirth]     AS [DateOfBirth],
       [Extent1].[Species]         AS [Species],
       [Extent1].[Color]           AS [Color],
       [Extent1].[Pic]             AS [Pic],
       [Extent1].[Animals_Animals] AS [Animals_Animals]
FROM   [dbo].[AnimalsSet] AS [Extent1]
WHERE  1 = [Extent1].[Id]

-- statement #4
SELECT [Extent1].[Id]              AS [Id],
       [Extent1].[Name]            AS [Name],
       [Extent1].[DateOfBirth]     AS [DateOfBirth],
       [Extent1].[Species]         AS [Species],
       [Extent1].[Color]           AS [Color],
       [Extent1].[Pic]             AS [Pic],
       [Extent1].[Animals_Animals] AS [Animals_Animals]
FROM   [dbo].[AnimalsSet] AS [Extent1]
WHERE  2 = [Extent1].[Id]

-- statement #5
SELECT [Extent1].[Id]              AS [Id],
       [Extent1].[Name]            AS [Name],
       [Extent1].[DateOfBirth]     AS [DateOfBirth],
       [Extent1].[Species]         AS [Species],
       [Extent1].[Color]           AS [Color],
       [Extent1].[Pic]             AS [Pic],
       [Extent1].[Animals_Animals] AS [Animals_Animals]
FROM   [dbo].[AnimalsSet] AS [Extent1]
WHERE  3 = [Extent1].[Id]

I told you that there is a select n+1 builtin into the product, now didn’t I?

Now, to make things just that much worse, it isn’t actually a Select N+1 that you’ll easily recognize. because this doesn’t happen on a single request. Instead, we have a multi tier Select N+1.

image

What is actually happening is that in this case, we make the first request to get the data, then we make an additional web request per returned result to get the data about the parent.

And I think that you’ll have to admit that a Parent->>Children association isn’t something that is out of the ordinary. In typical system, where you may have many associations, this “feature” alone is going to slow the system to a crawl.

Comments

Mike Chaliy
08/25/2010 06:50 AM by
Mike Chaliy

Regarding how they search for every field. This is seem to be a restriction of the RIA Services. They just workaround lack of normal search functionality by issuing search for every field.

Ayende Rahien
08/25/2010 06:55 AM by
Ayende Rahien

@Mike,

I don't really care what the limitations are, it is a bad behavior and a perf killer

Simon Bartlett
08/25/2010 07:18 AM by
Simon Bartlett

The very least they could is let the developer decide which entity properties are searchable, with an explanation that the more fields enabled the slower searches will be.

Demis Bellot
08/25/2010 07:30 AM by
Demis Bellot

Wow pretty sick: Select N+1 web service calls and multiple Like wildcard searches avoiding db indexes?? This looks like a tool to autogenerate 'bad developer code'.

This looks like its taking a lot of the RAD features from MS Access including the fact that it doesn't perform or scale very well.

RichB
08/25/2010 07:58 AM by
RichB

I've not read anything that says Performance is a goal of LightSwitch.

I look at it as I look at SQL CE - very useful in certain scenarios, but a long way from the best in other scenarios.

Ayende Rahien
08/25/2010 08:07 AM by
Ayende Rahien

RichB,

Aimed at professional developers, remember?

Beside, in the scenarios I covered, even 100 - 1000 rows will give you unacceptable perf issues, so it isn't like I am talking about ton of data or uncommon scenarios

RichB
08/25/2010 08:11 AM by
RichB

Really?

Aimed at non professional programmers, Visual Studio LightSwitch is

designed to enable inexperienced coders to quickly build business

applications for the cloud and the desktop. According to Microsoft, the

first beta of LightSwitch will be available for download from MSDN on

August 23rd. In a blog post, Microsoft’s senior vice president S

Somasegar noted that professional developers are not the only people

building business applications within an organization today. The

responsibility is being spread, and everyone is playing more than one

role.

Ayende Rahien
08/25/2010 08:14 AM by
Ayende Rahien

RichB,

Look at the keynote, they explicitly state there that it is intended for pro dev as well.

Beside, as I mentioned, that problem is horrible even for non pros.

RichB
08/25/2010 08:17 AM by
RichB

In that case, they need to sort their "messaging" out.

The biggest concern I have is not that 100 rows will give query times on the order of a second, but that a professional developer will be handed a dog-slow LightSwitch app and told to "sort it out" because Microsoft have told them the on-ramp to full Visual Studio is simple. But in reality, it will need to be totally rewritten.

Frans Bouma
08/25/2010 08:58 AM by
Frans Bouma

I do think you specified the navigation names wrong ;) On the 'many' side, you should have 'Parent' and on the '1' side you should have 'Children', as those are fields mapped onto the relationship for that side. So the 'many' side, has a property 'parent' as that's the field mapped onto the relationship.

I admit the dialog is pretty weak, even though they try to make it easy (which thus results in the conclusion they failed)

scooletz
08/25/2010 10:38 AM by
scooletz

Ayende,

maybe the whole application was designed to learn developers how to recognize the basic anti-patterns? The query listing made me laugh!

Alex Simkin
08/25/2010 01:04 PM by
Alex Simkin

"That is the sound of a DBA somewhere expiring"

The correct way would be to re-build full-text indexes on every add/rename/delete column in RAD tool, right?

Ed James
08/25/2010 02:51 PM by
Ed James

You all say the performance in unacceptable when you have no understanding of what is acceptable for a particular Light Switch application.

For a given Light Switch application, whilst waiting for data to be fetched, perhaps a delay of a few seconds is acceptable.

Ayende Rahien
08/25/2010 03:04 PM by
Ayende Rahien

James,

It isn't few seconds.

Let us say a typical app is for registration for boy scouts, you have Scout as a table, with the following references:

Mother -> Parents

Father -> Parents

Tribe -> Tribes

LatestAward -> Awards

That means that displaying a tribe of 45 scouts will cost you 181 queries!

Things are going to get awkward when you are making that many calls over the network, which then have to go over to the database.

Jeff
08/25/2010 04:30 PM by
Jeff

"That sound? Yes, the one that you just heard. That is the sound of a DBA somewhere expiring."

That quote made my day!

Eric Erhardt
08/25/2010 06:15 PM by
Eric Erhardt

Thanks for your analysis and feedback, Ayende.

I have a couple explanations on some behaviors you listed:

First, on the DTC issue, LightSwitch doesn't use DTC by default. Instead it is calling ObjectContext.Connection.EnlistTransaction, which the EntityFramework Profiler is just interpretting to be a "distributed transaction". The transaction doesn't become distributed until more than 1 provider enlists in it. Since by default only the SqlConnection will be enlisted in the transaction, a distributed transaction isn't created. However, this allows for people to use distributed transactions if they require one.

The reason the optimistic concurrency is implemented that way is because LightSwtich allows you to import your own table, which may not have a version column.

@Frans Bouma has it correct, the Add New Relationship dialog has the navigation property on the side it is defined. On the "child" side, it has a navigation property named "Parent". On the "parent" side, it has a navigation property named "Children". It makes more sense if you use two separate tables, but has confused me on self-references as well.

The LightSwitch team is aware of the Select N+1 problem and is currently working on addressing this problem. There wasn't enough time to get it into Beta1.

Ayende Rahien
08/25/2010 07:42 PM by
Ayende Rahien

re: DTC, okay.

re: Optimistic Concurrency - that is not a good reason not to do it right when you control the schema A-Z.

re: Self referencing - you might want to consider updating the dialog to make it clearer, then.

re: SELECT N+1, forget about it.

The way that the system is designed you can't escape what you have done by doing optimization.

This is built into the building blocks of the product, and fixing this would require re-structuring how you are doing things in a significant manner.

From my experience with MS products in the past, you aren't going to be able to do that.

In essence, your problem isn't that you used the wrong algorithm, your problem is that you have ignored the Fallacies of Distributed Computing in this regard, as well as with the images, as well as plenty of other places.

I was able to recognize the issue from the keynote, without any access to the code, and to be honest, I haven't really dug into the product.

Those sort of problems are not coding issues, they are architectual.

Petr Antos
09/09/2010 10:25 AM by
Petr Antos

Hello Ayende, thanks for your insights to some issues. I dont know NHibernate, but here I found your recommendations to solve N+1 trouble in it using its api features. Its very good ORM, no doubt.

What do you mean about this snippets? Can something as this solve some use cases for one query per table loading of nested 1:N relations? I tested this od Firebird on large data and it performed quite well, sure having indexed foreign keys as in example:

-- level 3 tables

SELECT * FROM LEVEL3 where LEVEL2_id in

(

SELECT id from LEVEL2 where LEVEL1_id in

(

SELECT id from LEVEL1 where (<>)

)

)

-- level 2 tables

SELECT * from LEVEL2 where LEVEL1_id in

(

SELECT id from LEVEL1 where (<>)

)

-- level 1 tables

SELECT * from LEVEL1 where (<>)

How in fact such queries behave internally at query engine?

Thanks in advance for your reply. I am big fan of declarative approach of LightSwitch, I think that things arent so bad as you suppose, because they can enhance internal behaviour, even at MS SQL level at least, if needed. Sure, it is very hard to solve complex things to be easy to use, but this is developers as you are work, or not? :-)

Ayende Rahien
09/09/2010 11:00 AM by
Ayende Rahien

Petr,

The problem with this is that it works well, until your level 1 query becomes complex.

In addition to that, it isn't only a single path that you need to worry about, it is multiple paths, and often in multiple levels.

For example:

Order.OrderLines

Order.Payments

Payment.Resultions

Payment.TransactionLog

OrderLine.Product

OrderLine.Discounts

OrderLine.Changes

Petr Antos
09/09/2010 12:15 PM by
Petr Antos

Thanks for reply. Umm OK, sure that db engine can handle relational theory well but doesnt know about object relations, obviously. So I had thinked about quite simple queries to db only, leaving partial results composition for ORM / appserver, hopefully as close to db as possible. As in azure there may be such conditions meet at least for "own" databases linkage and hopefully such beast can somehow try to "learn" even something about data in time for optimizations, even for remote databases data caching. But it is clear that current solution in LS is not very sexy, yet ;-). Would not be fine to throw away such troubles asap and concentrate real developers power to something "unpattternable"? Anyway, thanks for now!

jwc
09/12/2010 05:32 PM by
jwc

Hmmm

Let us say a typical app is for registration for boy scouts, you have Scout as a table, with the following references:

Mother -> Parents

Father -> Parents

Tribe -> Tribes

LatestAward -> Awards

That means that displaying a tribe of 45 scouts will cost you 181 queries!

Things are going to get awkward when you are making that many calls over the network, which then have to go over to the database.

And this will in fact be a typical usage for LS. And there will be 100 scouts in the troop, and it will run on someone's home computer.

No giant data set, no network slowing to a crawl, just someone getting their job done quickly and easily.

What I get is that you miss the entire point of LS.

The real message here is that the IT department has failed in their mission if you are terrified that your users are going around you and using tools like this.

The real message in your objections is your blindness to your own failures in supporting your (would be) clients.

What I would suggest is get off your high horses. Respond to those in your organizations begging for assistance, in a TIMELY MANNER, without "sticker shock" estimates. Then there will be no market for this kind of solution and your panic will go away.

Sadly, I don't see that happening.

As it happens, I appreciate that you are analyzing the weaknesses. This allows me to understand where to go to fix those weaknesses.

Ayende Rahien
09/13/2010 08:21 AM by
Ayende Rahien

jwc,

And this will in fact be a typical usage for LS. And there will be 100 scouts in the troop, and it will run on someone's home computer. No giant data set, no network slowing to a crawl, just someone getting their job done quickly and easily.

Except that even in this scenario, You are going to add MINUTES to the local load time the moment that you start adding images to the Scout

Ayende Rahien
09/19/2010 09:29 PM by
Ayende Rahien

Steve,

some of the Dynamics product lines

Here is a rule, every time that you say Dynamics does this, it is a good reason NOT to do this.

For reference:

ayende.com/.../The-CRM-Horror.aspx
ayende.com/.../Developing-on-Microsoft-CRM.aspx

For that matter, you probably want to look at this here:

ayende.com/.../Evaluating-a-Business-Platform.aspx

Ayende, you'd be a good contributor on the beta forum

Probably, but not from my point of view. No offence, but I don't think that spending what little free time I have on this is going to have good ROI.

Ayende Rahien
09/19/2010 09:30 PM by
Ayende Rahien

Beta 2 will provide spans (aka include, $expand)

It ain't going work, sorry.

You are going to trade off SELECT N+1 with Cartesian products and 10 table joins per each form.

Steve Anonsen
09/20/2010 07:22 PM by
Steve Anonsen

Ayende has seen these comments in priviate e-mail but others haven't. I'm referring to Dynamics ERP products (e.g. AX, NAP, GP) and other ERP products, while Ayende's concern was about CRM. I've not worked on CRM, so I can't speak to their behavior.

The default behavior in LightSwitch beta 2 does not use "cartesian products and 10 table joins per each form." It does not blindly include everything. I think it does what people want--take a look at it in beta 2 and give your feedback.

Steve

Petr Antos
10/08/2010 01:30 AM by
Petr Antos

((Impossible? Software?) impossible!) Take it easy Steve :-D

Comments have been closed on this topic.