Your queries are going to fail in production
Reading Davy Brion’s post There Is No Excuse For Failing Queries In Production had me twitching slightly. The premise is that since you have tests for your queries, they are not going to fail unless some idiot goes to the database and mess things up,and when that is going to happen, you are going to trip a circuit breaker and fail fast. Davy’s post is a response to a comment thread on another post:
Bertrand Le Roy: Would it make sense for the query to implement IDisposable so that you can put it in a using block? I would be a little concerned about the query failing and never getting disposed of.
Davy Brion: well, if a query fails the underlying SqlCommand object will eventually be garbage collected so i wouldn’t really worry about that. Failed queries shouldn’t happen frequently anyway IMO because they typically are the result of incorrect code.
The problem with Davy’s reasoning is that I can think of several very easy ways to get queries to fail in production even if you go through testing and put your DB under lock & key. Probably the easiest problems to envision are timeouts and transaction deadlocks. Imagine the following scenario, your application is running in production, and the load is big enough (and the request pattern is such that) transaction deadlocks begin to happen. Deadlocks in the DB are not errors. Well, not in the usual sense of the word. They are transient and should be dealt separately.
Using Davy’s approach, we would either have the circuit breaker tripping because of a perfectly normal occurrence or we may have a connection leak because we are assuming that all failures are either catastrophic or avoidable.
Comments
It might be irrelevant to your post but since you have mentioned deadlocks , I really like to know what's your advise (as a NHibernate professional) to prevent them ? As a matter of fact I've been working on a projects for 4 years and we are using NHibernate. In rare cases we are getting deadlocks .we have tried to trace them in SQL server logs but we haven't had any success to get rid of them.
Nima,
There aren't really ways of getting rid of deadlocks.
They are what happens when you have two transactions competing for the same resources.
In theory, you can try to restructure your queries so you will always access rows using the same path all the time, thus eliminating deadlocks.
In practice, it is not really possible to do something like that.
You can reduce deadlocks by specifying a lower transaction isolation level, but preventing them isn't likely in real world scenarios.
Luckily, recovering from them is very simple
All true. It also makes me think about the common assumption most developers have about Transaction Commits that generally don't fail. There are also a number of cases where this can happen (e.g. deferrable constraints in Oracle) but is usually not well handled.
Basically, to develop robust software, you must be paranoid :) (re-re-read M. Nygard's excellent Release It !).
Ayende, can you elaborate on recovering aspects? Timeouts ???
replaying the transaction
Fully agree. Failing query in production is normal. Deadlocks, version conflicts & timeouts are typical cases. Their ignorance = lack of actual understanding of processes @ RDBMS.
Or simply a mistake, of course. I hardly believe Davy isn't aware about this.
Would be nice to just be able to replay the transaction. But SQL server has so many bugs related to its transaction handling, that you never quite know when your transaction is broken.
And in usual style, they pretty much ignore whatever bug reports they get.
Dennis,
Do you have a repro that shows that transaction handling is broken on SQL Server?
connect.microsoft.com/.../ViewFeedback.aspx
I remember running into 1 more, but I dont think I reported that to M$ and the circumstances have now eluded me.
Interesting, but while this is a bug, it doesn't relate to what I consider the most important thing about transactions: ACID
You don't have data corruption here, you have a transaction that mysteriously dies (and rollback).
As long as you can trust ACID, you can replay transactions
Yes and No... Since the transaction is stopped without your knowledge and no error handling is done, then all subsequent writes are still written.
Meaning that you cannot just assume that none of your work was completed, and can thus not repeat the transaction.
btw. The "Remember me?" keeps forgetting to remember itself, thus forgetting me the second time a post is made.
The remember me shouldn't be on a second time, you are already remember. IOW, it shouldn't forget you.
And I fully agree about the problematic nature of silent aborts.
Have you tried contacting PSS?
No. They are a waste of time also...
Last time they spent weeks finding out why our failover setup was failing. Just to have us find out that they dont really support failover unless it is in a domain setup.
So in usual style, I implemented a workaround, which in this particular case actually ended up much cleaner.
I did remember me... Not quite sure why I am forgotten every second time I have something useful to say. Perhaps the timeout is too low?
Just checked, yes indeed just 1 month timeout on the cookie.
Ayende, when you mention "replaying the transaction" above, do you mean throw away the session and restart the whole unit of work from the beginning?
Paul,
Yes, you have to. You can't just execute the same statements, there is logic that may be affected
The most common case of failing queries in production is IMHO a database update with feature change. This happens often especially with MySQL (even for minor version changes), but we've seen it with other DBMS too, e.g. PostgreSQL.
Something was striking me as odd about his assertions there, and even if you have the most stringent of testing, some environments (e.g. FDA regulated) still require you to place checks in your code to handle scenarios that "are never supposed to happen"
Comment preview