Cost oriented programming in the cloud

time to read 4 min | 760 words

imageYou might be familiar with Moore’s law, which states that the number of transistors in a dense integrated circuit doubles about every two years. In effect, that performance doubles every 24 months. For many years, that has certainly held true. But that hasn’t been the case for the past 10 years or so. Even when Moore’s law held true, there was a snag. Wirth’s law is also in effect (as an aside, read his article “A Plea for Lean Software”, 23 years, and it holds true today) and Wirth’s law states that software is slower quicker than hardware is becoming faster. It’s good to be a software developer, because even when the CPU clock speed doesn’t jump all the time, we still get more CPUs to play with. A common approach to handle performance issues today is just to throw more parallelism at the problem until it shuts up.

To a certain extent, this make a perfect economic sense. In the calculus between developers’ salaries and the cost of hardware, you’ll usually find that buying a couple extra servers is drastically cheaper than spending another 6 months improving the performance of the system. Jeff Atwood wrote about the topic a decade ago and I think that he is still very much correct, to a degree. There are other factors to consider, which is the overhead over time and in particular in the cloud. One of the major factors to take into account here is that when you are running in the cloud, you aren’t running on your servers and you are charged on usage. That can change the math by quite a bit.

For example, if I bought a couple of servers, the number of IO operations that I make is pretty much meaningless to me. If I hitting the disk ten times a second or ten thousands times a second, I’m paying the same cost. Oh, sure, I might need to buy a better hard disk to get 10,000 IOPS, but that is a one time cost, and usually not that meaningful in the grand scheme of things. But when you are on the cloud, getting higher IOPS cost more, and it cost more over time. In the same sense, in my data center, the cost of querying a database is zero. In the cloud, you will typically be charged some (miniscule) amount on a per query basis. Nothing to worry about, except that your software still work according to Wirth’s law, so you are making more queries than you should, which means that you are charged for each and every one of them.

I used to make my living by being a database performance consultant. I would go to a customer, look at how they are using their database and optimize the access pattern. It was common to see 90%+ savings in the number of queries for common operations. I was doing that because that directly translated to better responsiveness of the application. Applications that respond faster are more pleasant to use and there are numerous studies about faster applications generating higher revenues. I remember talking to clients and explaining to them why they should invest in the overall performance of the application before they hit the total resource depletion that would take them down.

Today, in the cloud, I would have a much simpler task. Let’s assume a simple application using CosmosDB, as an example. With 200 page views / sec on the site, and each page view generating 80 requests to the database, that gives us a total of 16,000 requests a second, which translates to an end of the month bill of about 10,000$. I’m using the 80 queries / page view as a reasonable low ball estimate, mind. Drupal, for example, does 300 – 400 queries per page view and it is easy to get to these numbers without paying attention. Dropping the number of queries per page to 10 (which is usually pretty easy to do with proper queries, attention to details and some caching) gives you a database bill that is around than 1,000$.

Over the span of a year, that is enough to pay a full time developer that can go and find other places where your software can be improved. And unlike before, you don’t need to justify with studies or any indirect causation. You can point directly to the bottom line in an invoice and show how much money is saved.