It is rare to see a live blog from a book, but what the hell. I read this in about 8 hours, and I can't recommend it enough.
I am currently at chapter 5, but I had to stop and think about all the stability anti patterns that Chapter 4 has. More specifically, I had to think about all the stability anti patterns that I have put in my own code. The only thing that actually happened to me in production was an Unbounded Result Set, which had some nasty affects on performance, but not critically so.
But as I am reading the stability anti patterns list, I am seeing more and more things that I am doing wrong. At least, they are wrong if I want to scale in a stable manner. It is another argument for the the simplest way ain't the right way approach.
There are a few books that I can think of that caused a fundamental change in the way that I am writing and designing software. But even at this point, I think that this is going to be one of them.
I love this quote: Hope is not a design method.
Data purging is a PITA. I run into issues with it just last week, we needed to remove unrelated data, and we didn't take into account how things are handled in the application, basically, we removed rows in an ordered collection, which cause it to fail.
I like the idea of a Test Harness, an application that simulate badly behaved network and integration points. Anyone is familiar with an existing one?
Evaluating SMS for async messaging platform: message brokers are usually implemented with carbon and have extremely high latency
Chapter 6 is interesting, it starts with explaining the "one in a million in next Tuesday" approach for unlikely issues. Astronomically unlikely coincidence happens every day.
Something that Michael keeps repeating is that sessions are the bane of stability & capacity, this is interesting. Even more interesting is real world experience of putting big sites out there, and what kind of environment they have to face. Specifically, the description of the unexpected access and the results from that are enlightening. It looks like you need to do some really nasty things in your QA environment, just to get things slightly like what they are for production. Interesting problem.
Something that I can't really make myself believe is requiring over a thousand database transactions to render the home page.
Another good quote: A million dollars will pay for a lot of profiling and optimization in the application stack...
Section 9.7 talks about the dangers of bypassing the OR/M layer for direct DB access because they can kill the site capacity, I wholly approve. The SQL that most OR/M tools generate is predictable, so it is very easy to optimize for that. Another important thing is that you need to develop against realistic data set. This usually mean building a data generator to spit some data at the DB.
Connection pooling is another strong theme in the book, which make me wonder. The CLR has both a thread pool and a connection pool, but they are on the background, I have needed to meddle with the thread pool in the past, but not with the connection pool. And I have heard very little about connection pools in .NET, while on the Java side, there seems to be one per project, I wonder why? It is that the default pool does enough to handle most scenarios that it is not necessary to do it?
I didn't know that Java had object pools, that sounds... weird. Especially if someone used the statement "creating an object is the second most expensive thing you can do" (the first being creating a thread). That is certainly not the case now, and I find it hard to believe that it was true then.
Blamestorming <- I love this term.
Michael is certainly mentioning Akami quite a bit.
On Chapter 14 now, talking about administration. That and transparency are two things that I am very interested to learn more about. He mentions the use of cmdline apps that allows to write admin scripts against the application, as well as startup verification and the ability to interrogate a funked server. Both are interesting.
I have been thinking about including a boo interpreter in my project, so I could login to the application and directly "debug" it. I wonder how this can work for real. This would certainly allow me to access everything in the application, live.
Configuration, and the importance of separating internal configuration (IoC settings) and application settings (database names), was mentioned, including the necessity to make the configuration understandable to the admin. He mentioned the problem of distributing configuration across multiply hosts, I wonder why not use a well known configuration service for that.
There is a really good description about how they were able to get a system working by basically going in and setting properties on live instances. So I guess the idea of an interpreter isn't that weird. There even a name for it, Recovery Oriented Computing.
Another good quote: Last week is history only in IT and high fashion.
A startling number of business-level issues can be traced back to batch jobs failing invisibly for 33 days straight.
Ouch! Ouch! Ouch! Been there, done that, ouch!
Log file readability: Yes! Yes! Yes! Especially when the only interface into the system is the log file! Some good points how to make a log file readable, and good message logging.
I like the discussion on superstitions as a survival trait.
It seems that scriptability of a live application is a big thing in the Java space, with JMX being the standard way to do that. We don't have anything like that for .NET, but I think that it should be simple enough to wire something up with Windsor and IRemoteQuackFu :-)
Another alternative to Boo would be PowerShell, which admins are supposedly familiar with.
On chapter 18, adaptation. I would call it maintainability, but I think it is the same thing.
I like the view on integration databases:
Integration databases - don’t do it! Seriously! Not even with views. Not even with stored procedures. Take it up a level, and wrap a web service
around the database. Then make the web service redundant and accessed through a virtual IP. ... That’s an enterprise integration technology. Reaching into another system’s database is just...icky.
I have been saying much the same thing for a long while now. Very icky indeed.
The description of zero downtime deployment reminded me of the Erlang talk in JAOO, you have no consistent state, and you must deal with this.
Final thought, design for production, not for QA
Go and read this book, you really need it.