How to mis-configure your production to dead crawl
One of the things that I avidly subscribe to is that when you are building server applications, visibility into what the application is doing is key. If you don’t have visibility into the application, you are left in the dark, stumbling and alone.
NH Prof is a good example of how you make things much more visible to developers, but I am doing this in many other places, Rhino Service Bus is another example of how important it is to have visibility into what your servers are doing. Being able to see what is actually going on in your apps is priceless!
However, that does come at a cost. And a pretty steep one at that.
Logging is expensive. It is expensive to gather the data to log, it is expensive to actually log the data, in order to produce coherent logs in multi threaded environment, you need to perform synchronization, etc.
One of the most common performance problems that people run into is when they push their debug logging configuration to production. If we will look into NHibernate as a good example, if you set the log level to debug, you are going to get tens of thousands of messages just for the startup! And you are going to get hundreds and thousands of log messages for routine operations.
That is going to slow things down, considerably.
Production configuration management is a big topic, and it deserves its own post, but the impact of setting the log level too low in production cannot be stressed strongly enough.
Pay attention to this, or it will make you pay attention to it.
In one notable case, several years ago, we re-implemented a whole module in the application (significantly optimizing it) to get exactly zero benefit (well, negative benefit, actually) from the exercise, since the actual problem was slow logging database.
In another, we improved performance by over 60% by simply raising the log level from DEBUG to ERROR.
Comments
I strongly agree, but I have a question: as a person who's never had the pleasure of maintaining a big enough server system while it was in production (the only ones I've written had very low usage) - what do you log on production? I suppose you don't turn off logging entirely, do you?
I am going to touch on that in a separate blog post
A solution that has worked well for me is to aggregate logging messages and periodically flush them to the log. Of course- and you've indicated this above- the messages being batched in production are much less verbose than the development / test environments.
SmartInspect by Gurock Software. Logs all, very - very -very fast, viewer filters by threads, too many features to list. Nuff said.
"visibility into what the application is doing is key"
That is one of the things I found interesting from your NDC talk where you mention SLA's and queues. Not necessarily logging, but still a glimpse inside your applications state.
The Rhino Service Bus link is broken (404).
@Bryan
I don't know if you have used NLog for logging in your application, but NLog has some nice features that can help to keep both log messages and performance. For example, there's an async log target that writes messages to a log in a background thread, without slowing down threads that do the real work. There's also a caching target (I don't really remember the details) that collects thread's debug messages in memory and writes them all to log file if an exception occurs. If there's no exception thrown no debug messages are written - this way you can have debug information only for requests ending with an exception. And it has minimum overhead for non-logged messages.
I recently converted a bunch of logging in an app from Debug.Trace to log4net since we actually needed the logs.
Long story short, turning logging from DEBUG to INFO took a reporting process down from about half an hour to about a minute and a half.
Our stuff automatically sets all loggers to ERROR on deployment after precisely that cock up. It's even worse with the verbose logging in Spring.Net
Comment preview