Ayende @ Rahien

It's a girl

Ask Ayende: What about the QA env?

Matthew Bonig asks, with regards to a bug in RavenDB MVC Integration (RavenDB Profiler) that caused major slow down on this blog.:

I'd be very curious to know how this code got published to a production environment without getting caught. I would have thought this problem would have occurred in any testing environment as well as it did here. Ayende, can you comment on where the process broke down and how such an obvious bug was able to slip through?

Well, the answer for that comes in two parts. The first part  is that no process broke down. We use our own assets for final testing of all our software, that means that whenever there is a stable RavenDB release pending (and sometimes just when we feel like it) we move our infrastructure to the latest and greatest.

Why?

Because as hard as you try testing, you will never be able to catch everything. Production is the final test ground, and we have obvious incentives of trying to make sure that everything works.  It is dogfooding, basically. Except that if we get a lemon, that is a very public one.

It means that whenever we make a stable release, we can do that with high degree of confidence that everything is going to work, not just because all the tests are passing, but because our production systems had days to actually see if things are right.

The second part of this answer is that this is neither an obvious bug nor one that is easy to catch. Put simply, things worked. There wasn’t even an infinite loop that would make it obvious that something is wrong, it is just that there was a lot of network traffic that you would notice only if you either had a tracer running, or were trying to figure out why the browser was suddenly so busy.

Here is a challenge, try to devise some form of an automated test that would catch something like this error, but do so without actually testing for this specific issue. After all, it is unlikely that someone would have written a test for this unless they run into the error in the first place. So I would be really interested in seeing what sort of automated approaches would have caught that.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Shawn
01/31/2012 08:23 AM by
Shawn

I don't know if it's worth it or not, but you can test for this class of bug in a way that's not too dissimilar to how your profiler warns you about things like N+1 selects or other bad behavior.

From the perspective of a proxy between the browser and the server, there's a "normal" communication pattern and then there is an "abnormal" one. It would not be hard to imagine a scenario where this sort of traffic would be flagged as abnormal, causing a test to fail somewhere.

If I was shipping the Facebook Like button, I'd be doing tests like that. In this specific case, I kind of doubt the payoff is there.

Péter Zsoldos
01/31/2012 10:44 AM by
Péter Zsoldos

Something we are planning in the near future is to run a crawler bot (potentially using selenium) on our staging environment, and then compare the current run's performance (from the crawler and from monitoring) against historic data, to see if there are any significant changes.

Also, IIRC people doing Continuous Deployment use monitoring to trigger rollbacks to prior versions on performance degradation of the production system.

Neil Mosafi
01/31/2012 10:47 AM by
Neil Mosafi

Hard one really. The only thing I could think of would be some kind of record/replay regression testing software, which records all traffic between client and server whilst simulating some user actions and verifies there are no unexpected requests.

Trouble is these kind of tests tend to be really fragile so would be breaking every time you made a change, so you might still miss it.

Ayende Rahien
01/31/2012 12:25 PM by
Ayende Rahien

Shawn, How would you try capturing something like that? Is this something that would be easy or obvious to capture without actually thinking about this scenario up front.

Ayende Rahien
01/31/2012 12:27 PM by
Ayende Rahien

Peter, The problem is that there are many variants that can affect client performance, you have to put a big fudge factor, and that means that you can easily slip things. And monitoring will generally not tell you that there is a problem client side.

Ayende Rahien
01/31/2012 12:28 PM by
Ayende Rahien

Neil, Exactly, this sort of scenario is incredibly fragile, and you really don't want to try to do those. You would have failing tests left & right

Neil Mosafi
01/31/2012 12:32 PM by
Neil Mosafi

Yes. I've seen teams get into a LOT of trouble with having to maintain hundreds of those kind of tests. Still maybe having 5 or 10 high level ones might catch this, not sure.

Péter Zsoldos
01/31/2012 12:38 PM by
Péter Zsoldos

Ayende,

while I have no experience/data to back this up, this is on of the reason we are considering to use selenium to do the crawling, so we get closer to real client experience. And by monitoring I don't just mean occasional GET requests to the server to see whether it's alive, but the whole network infrastructure (network traffic data would be useful for this case). And of course, we are monitoring application metrics and plan to expose metrics gathered from the crawler to the monitoring service to be able to see trends easily.

I will blog about it once we've got around to implementing it, though it's not in the close future (the pain is not big enough yet)

Christopher Wright
01/31/2012 03:30 PM by
Christopher Wright

If you had set up automated UI tests and they explicitly waited until all ajax calls for loading the page had finished, you'd notice that they were waiting for an unreasonably long time (that is, forever).

Waiting for all the ajax loading to finish is reasonable if your page makes use of it for standard content. This blog seems relatively static, so that's a step I probably would have neglected.

And that's assuming you have automated tests in the first place...

If you're testing manually, it's a question of whether you happen to have firebug open at the time and taking up enough space for you to notice the fishy ajax requests.

Matthew Bonig
01/31/2012 03:33 PM by
Matthew Bonig

When I had posted the original comment I had thought that this is something that could have been caught with just one client hitting a test environment. After reading the OP more I can see that this probably only happened because of the load that a production environment imposes.

Assume my understanding is correct it leads me to the next question I'd have:

Do you do stress testing?

Wyatt Barnett
01/31/2012 03:45 PM by
Wyatt Barnett

On the whole testing with selenium angle -- check out BrowserMob. It does selenium testing, but out in the cloud so you can get alot closer to real traffic. Definitely has shown us where our apps fall down.

Dmitry
01/31/2012 05:59 PM by
Dmitry

Is it related to that annoying debugger/profiler panel that flashes before the pages load?

Royston Shufflebotham
02/01/2012 11:35 AM by
Royston Shufflebotham

Definitely agree with the notion that there's little point adding a specific test case for this specific fault case 'after the horse has bolted'. But there is a way to test for problems like this in general, which can be worthwhile:

The reported issue was that the blog was responding slowly. The cause was the repeated requests.

This implies that response speed is part of your acceptance criteria for the software. If that's the case, it's worth having some automated perf tests which would then catch problems with performance, and specifically response time. (Of course, coming up with the NFRs for that is always 'interesting'...)

So, you could introduce some very basic system load tests that simulated some number of users doing some set of actions, and checking that performance is as desired.

Then, no matter what performance fault is introduced that would affect overall system page response (be it bad AJAX handling, dodgy DB indexes, or anything else), it would be detected before going into production. Obviously there are always some bits of production you can't roll into testing, but you can mock at least some of those out.

Basically:

If you only do functional or performance tests against bits of a system before production, you're open to system functional or performance problems in production. If you only do functional - and not performance - tests against a full system before production, you'll only spot certain performance problems in production.

Ayende Rahien
02/01/2012 02:06 PM by
Ayende Rahien

Royston, It ain't that simple. The response speed was absolutely fine. You would need a full fledged browser along with several minutes of idle time to actually discover that anything is wrong.

Royston Shufflebotham
02/01/2012 02:16 PM by
Royston Shufflebotham

If "caused major slow down on this blog" didn't impact the response speed, what did it impact that was eventually detectable as a problem? Whatever that is, that's your criteria that needs to be tested before production.

Browsers are part of the overall system (and have certainly been known to have the occasional functional or performance foible!), as is the Javascript code that lives in those pages. If you're not running system tests involving real browsers, I'm not sure I'd call them full system tests.

Of course, the only way to get true real-world fault detection is to run your system in the real world. At some point there's a crossover between the cost of simulating the real-world sufficiently at test time, and the cost of having a bug in production. So yeah, it's all doable. But whether it's worthwhile is entirely down to where that cost crossover lies for you...

Ayende Rahien
02/01/2012 02:20 PM by
Ayende Rahien

Royston, It caused a slowdown on the client's machine. On the server, everything was good. Full system tests that include browser code are slow, fragile, hard to work with and generally a mess. I much rather have a staging env to do those sort of things at. Which is why we dog food stuff at this blog.

Andy Monroe
02/04/2012 10:19 PM by
Andy Monroe

Ayende, I've run across this type of issue several times under positive feedback loops like the one you mentioned (event handler triggers itself) or negative feedback loops from misconfigured resource pools (thundering herd where retry mechanisms don't back off and keep you in a failed state indefinitely).

A generic way to test for this concept is to use a "quiesced assertion" (made up term). You would need to have usage counters at various points in your system (number of http requests, number of times a function is called, whatever is relevant to your system) and also make them externally measurable via some api.

At the end of a regression suite or load test once you've expected the system to be idle you can then verify that specific usage counters are staying constant and that your system is not in a feedback loop.

I've used this approach for testing as described above or in monitoring to alert when usage counters exceed some specified rate.

Fadi
02/06/2012 03:03 PM by
Fadi

What you're saying is very identical to Microsoft's strategy. Release now, fix later, regardless of how buggy the system is. Just do some basic testing, and then release - let the customers experience the bugs, report them, and then improve upon the feedback.

I don't think it's the best strategy, but sadly, it is the only strategy especially when the product becomes very large.

Ayende Rahien
02/06/2012 03:06 PM by
Ayende Rahien

Fadi, I would strongly disagree. There is a LOT of difference between pushing unreleased software to this blog for testing in a live prod env and sending this to customers.

Comments have been closed on this topic.