The Guts n’ Glory of Database Internals: The enemy of thy database is…
You might have noticed a theme here in the past few posts. In order to achieve a good performance and ensure the stability of the system, most database engines has a pretty good relationship with the low level hardware and operating system internals.
You pretty much have to do that, because you want to be able to squeeze every last bit of performance of the system. And it works, quite well, until people start lying to you. And by people, I mean all sort of stuff.
The most obvious one is the hardware. If I’m asking the hardware “please make sure that this is on persisted medium”, and the hardware tell me “sure”, but does not such thing. There is very little that the database engine can do. There are quite a lot of drives there that flat out lie about this. Typically enterprise grade drives will not do that unless you have them configured to survive power loss (maybe they have a battery backup?), but I have seen some production systems that were hard to deal with because of that.
Another common case is when administrators put the database on a remote server. This can be because they have a shared storage setup, or are used to putting all their eggs in one basket, or maybe they already have backup scripts running in that location. Whatever the actual reason, it means that every I/O operation we do (already quite expensive) is now turned into a network call (which still need to do I/O on the other side), so that mess up completely the cost benefit analysis the database does on when to actually call the hardware.
Sometimes you have an attached storage directly to the server, with high end connection that provides awesome I/O and allow you to stripe among multiple drives easily. Sometimes, that is the shared storage for the entire company, and you have to compete for I/O with everything else.
But by far the most nefarious enemy we have seen is Anti Virus of various kinds. This problem is mostly on Windows, where admins will deploy an Anti Virus almost automatically, and set it to the most intrusive levels possible, but I have seen similar cases on Linux with various kernel extensions that interfere with how the system works. In particular, timing and contracts are sometimes (often) are broken by such products, and because it is done in an extremely low level manner, the database engine typically has no idea that this happened, or how to recover. For fun, trying to figure out if an Anti Virus is installed (so you can warn the admin to set it up correctly) is one of the behaviors that most Anti Virus will look for and flag you as a virus.
Now, we have run into this with Anti Virus a lot, but the same applies for quite a lot of things. Allowing an indexing service to scan the database files, or putting them on something like Dropbox folder or pretty much anything that interfere with how the data gets persisted to disk will cause issues. And when that happens, it can be really tricky to figure out who is at fault.
Finally, and very common today, we have the cloud. The I/O rates in the cloud are typically metered, and in some clouds you get I/O rates that you would from a bad hard disk 10 years ago. What is worst, because the cloud environment is often shared, it means that you are very vulnerable to noisy neighbors. And that means that two identical I/O requests will complete, the first in 25ms, and the second in 5,000 ms (not a mistake, that is 5 seconds!). Same file, same system, same size of request, same everything, spaced two seconds apart, and you hit something, and your entire performance work goes down the drain. You can get reserved IOPS, which helps, but you need to check what you are getting. On some clouds you get concurrent IOPS, which is nice, but the cost of serial IOPS (critical for things like journals) remains roughly the same. This is especially true if you need to make unbuffered I/O, or use fsync() on those systems.
We have actually had to add features to the product to measure I/O rates independently, so we can put the blame where it belongs (your drives gives me 6 MB/sec on our standard load, this is the maximum performance I can give under this situation).