You see that database? OFF WITH HIS HEAD!
A while ago I was chatting with a friend that complained about a migration script timing out on his local machine. When I looked at the script, it was fairly obvious what was wrong:
DECLARE @VideoID UNIQUEIDENTIFIER DECLARE @NewID UNIQUEIDENTIFIER DECLARE VideoCursor CURSOR READ_ONLY FOR SELECT ID FROM Video OPEN VideoCursor FETCH NEXT FROM VideoCursor INTO @VideoID WHILE @@FETCH_STATUS = 0 BEGIN SET @NewID = NEWID() INSERT INTO Content (ID, Body, FormatBody, Plain) SELECT @NewID, ContentItem.Body, Video.FormatBody, Video.Plain FROM ContentItem INNER JOIN Video ON Video.Id=ContentItem.ID WHERE Video.Id=@VideoID UPDATE Video SET ContentId=@NewID WHERE Video.Id=@VideoID UPDATE ThumbImage SET ContentId=@NewID WHERE Video_id=@VideoID FETCH NEXT FROM VideoCursor INTO @VideoID END
The script was using a cursor!
Every time you want a use a cursor, you must fast for three days while reading the memoires of Edgar F. Codd.
Cursors are evil!
Let us see how we can make this work using set based logic, shall we?
INSERT INTO #TempContent SELECT newid() as NewId, Video.Id as OldId, ContentItem.Body, Video.FormatBody, Video.Plain FROM ContentItem INNER JOIN Video ON Video.Id=ContentItem.ID WHERE Video.Id=@VideoID INSERT INTO Content(ID, Body, FormatBody, Plain) SELECT NewId, ContentItem.Body, Video.FormatBody, Video.Plain UPDATE Video SET ContentId=NewId FROM #TempContent WHERE Video.Id=OldId UPDATE ThumbImage SET ContentId=NewId FROM #TempContent WHERE Video.Id=OldId DROP TABLE #TempContent
I can assure you that this will work faster, read better, get parallelize by the database and in generally be better behaved than the previous version.
Comments
Just to highlight other evils of cursors...
I was asked a question recently about a bug with a cursor that was updating the source table, causing the cursor to run forever. I could vaguely remember that there are some flags (when you declare the cursor) that control this behaviour - but in all honesty I have no interest in learning the details of those flags. I agree as with the above; the best answer is to ditch the cursor completely. Either in-place, or using a temp-table/table-variable (depending on size).
In some ways, things like ROW_NUMBER() provide answers for some of the other excuses that people use for cursors (of course, you can use an auto-incrementing identity in a temp table for the same purpose).
I haven't seen any cursor-only solutions, except for schema-modifying operations.
When I worked on projects where there was no O/R at all, just a lot of stored procedures, even then there was no excuse to using a cursor and project guidelines outlawed them.
I fully agree with your solution and with the statement that cursors are evil, BUT my experience says also that the usual causes for bad performance are missing/wrong indexes or transaction/locking issues (including lack of read uncommitted on selections) ...
Of course you are right.
But apparently this is not true for Oracle, but I'm no expert.
he also could have enlarged the tempdb, into which sqlserver inserts tempresults for cursors.
Cursors are in general a sign of imperative thinking, so one should always strive for set-based statements. However cursors do have their place, though the # of cases is limited.
I think the original code lacks of a commit fase(e.g. each 1k rows)... with large data the REDO log (or whatever is called in MSSQL) may be causing the timeout..
Your example is very simple, sometimes you need some calculations for each row so you don't have any other option than using cursors..
eledu81,
Making calcs per each row is easy with SET based logic
WHERE Video.Id=@VideoID shouldn't be there in the non-cursor example.
Yes, you can do SET, but if you have a more complex logic there like selecting from more than one table or need conditionals for calculating values, transforming data, etc you will end with a gigantic SELECT/INSERT statement or multiple statements in the same table that I think will not perform better than using cursors.
Also you will lose control, if some row have bad data the whole operation will rollback and you can't know what row was the problem, adding exception handling and logging is easy in the original code.
Are you saying there is absolutely no reason whatsoever to ever even think about using cursors?
I am saying that there are, but they are FAR rarer than the actual usage of cursors.
I've a SP with not one nor two cursors but 5...all 5 are useless...fun...
I lost count of how many times I've had to say "Dont do that with a cursor" :(
But it is HARD sometimes to figure out how to do something in one query, but on the other hand it made whomever I told a lot better at the next queries they made.
deyanp mentions locking issues being a big source of performance pains. I know this is slightly off the original topic here, but I was wondering what some strategies are when performing batch updates on live databases, that may have locks.
I think it depends on your database platform. I basically never use cursors in MSSQL. I use cursors occasionally in Oracle.
@configurator
Cursors are one of those "no-bullet" things. They are never a good solution and its always possible to do something without a cursor.
@John Farrell
I don't agree. They are simply another tool in the toolbox and there are some situations where they will be the right tool to use. The key is knowing when to use them. Though I will say that if you don't know if "now" is the right time to use them, it's not the right time to use them.
I'm a fashion junky, cursors are soooo 1990s ;)
Temp-tables are so 2k0.
When you don't need indexes, you can use table variables.
declare @mytable table (uniqueidentifier id, stuff nvarchar(4000))
insert into @mytable .......
;)
@configurator>Are you saying there is absolutely no reason whatsoever to ever even think about using cursors?
@Ayende>I am saying that there are, but they are FAR rarer than the actual usage of cursors.
Which is ironic since NH's Pit of Success is cursor based operations.
RichB, can you clarify that comment? I don't see how I am supposed to be doing cursor based operations when using NHibernate.
@frank:
foreach(Video in session.CreateQuery("from Video")) {
session.Save(new Content {....});
}
That's not set-based. It's imperative.
If cursors are so bad how am I supposed to know where my mouse pointer is?
How about switching to a FAST_FOWARD cursor in these cases rather than remake one as a temp table. And if you are going to use a temp table, use a table variable. Just replace the # with @, mostly.
Everyone always says how terrible cursors are, but when I write one correctly no one can find a faster way to do the same work. Not yet, at least, and I don't write them when they aren't useful or substantially clearer code. It doesn't look like the one here is necessarily the best way to do it, but it sure looks like the cleanest.
Also, try this pattern for cursor acesss, since it removes the redundant/confusing initial access and consolidates the variable setting (but replaces with a bit odd 1=1 loop)
WHILE (1=1) BEGIN
END
END
couldn't agree more
@RichB:
Maybe that is imperative, but then again, when doing that you are in imperative C# land. Not in the trenches of set-based SQL.
Some observations
Astounded to hear that someone using cursors mainly for logical clarity does not find they are typically slower than set-based operations. I totally agree with Ayende. The rule in our shop is "if you use a cursor, you need to be able to show that nothing else will do." The reason is simple. Overwhelming experience is that cursors (even "FAST_FORWARD" ones) are much slower than set-based equivalents in SQL Server.
Table variables can be preferable to temp tables, but remember one thing: the optimizer does not have any statistics on the contents of a table variable. This means that if the table variable has a lot of rows, some very demented query plans can arise, particularly if it's being used in joins against large tables.
I must disagree with many - sorry to say...
I'm however totally in agreement with the statement "Any set-based logic should be executed in a cursor", in fact.. If you cannot write set-based queries you should look for another job asap or ask somebody willing to educate you.
But but... Cursors are indeed very nice to iterate over a resultset and call stored procedures, start/ stop agent jobs etc. of course such operations rarely iterates over 100K rows.
Did you know that all replication technologies depends 100% at cursors? Don't be shocked :)
Saying cursors are evil, should be said on the right context!
Errata: "Any set-based logic should be executed in a cursor" should of course be "Any set-based logic should NOT be executed in a cursor"
Comment preview