History of storage costs and the software design impact
Edgar F. Codd formulated the relational model in 1969. Ten years later, Oracle 2.0 comes to the market. And Sybase SQL Server came out with its first version in 1984. By the early 90s, it was clear that relational database has pushed out the competition (such as navigational or object oriented databases) to the sidelines. It made sense, you could do a lot more with a relational database, and you could do it easier, usually faster and certainly in a more convenient manner.
Let us look at what environment those relational databases were written for. In 1979, you could buy the IBM's 3370 direct access storage device. It offered a stunning 571MB (read, megabytes) of storage for the mere cost of $35,100. For reference the yearly salary of a programmer at that time was $17,535. In other words, the cost of a single 571MB hard drive was as much as two full time developers, for an entire year.
In 1980, the first drives with more than 1 GB storage appeared. The IBM 3380, for example, was able to store a whopping 2.52 GB of information, the low end version cost, at the time, $97,650 and it was about as big as a washing machine. By 1986, the situation improved and purchasing a good internal hard drive with all of 20MB at merely $800. For reference, a good car at the time would cost you less than $7,000.
Skipping ahead again, by 1996 you could actually purchase a 2.83 GB drive for merely $2,900. A car at that time would cost you $12,371. I could go on, but I'm sure that you get the point by now. Storage used to be expensive. So expensive that it dominated pretty much every other concern that you can think of.
At the time of this writing, you can get a hard disk with 10 TB of storage for about $400 [1]. And a 1 TB SSD drive will cost you less than $300[2]. Those numbers give us about a quarter of a dollar (26 cents, to be exact) per GB for SSD drives, and less than 4 cents per GB for the hard disk.
Compare that to a price of $38,750 per gigabyte in 1980. Oh, except that we forgot about inflation, so the inflation adjusted price for a single GB was $114,451.63. Now, you will be right if you'll point out that this is very unfair comparison. I'm comparing consumer grade hardware to high end systems. Enterprise storage systems, the kind you actually run databases on tend to be a bit above that price range. We can compare the cost of storing information in the cloud, and based on current pricing it looks like storing a GB on Amazon S3 for 5 years (to compare with expected life time of a hard disk) will cost less than $1.5, with Azure being even cheaper.
The really interesting aspect of those numbers is the way they shaped the software written at that time period. It made a lot of sense to put a lot more on the user, not because you were lazy, but because it was the only way to do things. Most document databases, for example, are storing the document structure alongside the document itself (so property names are stored in each document. It would be utterly insane to try to do that in a system where hard disk space was so expensive. On the other hand, decisions such as “normalization is critical” were mostly driven by the necessity to reduce storage costs, and only transitioned later on to the “purity of data model” reasoning once the disk space cost became a non issue.
[1] ST10000VN0004 - 7200RPM with 256MB Cache
[2] The SDSSDHII-1T00-G25 - with great then 500 MB / sec read/write speeds and close to 100,000 IOPS
Comments
Saving disk space is not as critical as before, you can do reasonable trade offs nowadays, use more space and get X by doing it, without significant cost differences ... however .... I think saving as much space as possible is still a critical factor for any database ... it leads to better performance, better scalability, lower maintainability costs, higher reach. , more indexes, etc ...
Having 1 GB of data stored as 10 GB of uncompressed plain text, is a waste even today, even if the cost is a non issue.
I think that's a mis-characterisation of Codd's thinking, the other styles of database (hierarchic, oo etc) all the possibility of update anomalies/incorrect data due to their design. Codd was the first(?) to formulate a coherent theory of how to store data in a non-redundant, consistent manner, the relational calculus. IBM didn't actually like the idea that much as it threatened their revenue stream from their hierarchic database IMDB and others implemented it first.
I do agree that it's the cost/performance trend that drove the adoption of relational databases, and lately document database. Bear in mind all through the 80s I was told that relational databases were a good theoretical idea but you would never actually implement a database in 3NF as it would be too slow and that for real speed you needed a hierarchic database.
The whole speed/size thing was brought home to me in a recent movie "Hidden Figures" about the black female mathematicians ("computers") that performed much of the calculations for the early space program. The fun bit for me was the introduction of an IBM mainframe which could do 35K floating point calculations per second - I was the only one who giggled in the audidence; I'd out that the IPhone in my pocket was approximately 10^6 times faster and had about 10^8 more storage.
Ah, the good "old" days :
In 1991 my family bought a 286 with 40mb drive with 1mb ram in 1995 we bought pentium 90mhz with 850mb drive and 8mb ram in 1999 we bought pentium 2 350mhz with 15gb drive and 64mb ram in 2001 we bought pentium 3 1000mhz with 40gb drive and 128mb ram .... in 2010 we bought i5-750 with 500gb drive and 4gb ram in 2014 i5-4670 with 480gb SDD drive and 8gb ram
we in almost in the middle of 2017 , almost 3.5 years after the last purchase i dont see any reason to upgrade.
The CPU performance progress is a joke , disk storage is not getting bigger because for most home users 500gb-1tb is enough.
Getting 16gb ram is cheap but for most users even 4gb is enough.
Bottom line , the progress is getting slower and slower (what ever the reason is , mobile phone market , cloud computing , or not enough competition)
Don't think so.. the main reasons for using rdbms are still relevant today, even with storage cost so low And if you dont care about size of the data, what happens with the cost of network, disk IO, RAM, CPU needed just to process the extra overhead of schemaless document? Why is everybody in the nosql camp busy developing more efficient storage formats?
Rafal, I/O cost dominates by far. Compress a bit and you have saved most of the costs, period. And have a system that is far more flexible.
you're right, but still low storage cost dont meant relational databases lost their merit. It just makes the document databases more practical alternative.
Note that your footnotes link to your local disk. Is this a chapter of your new book? :)
kpvleeuwen, Part that was cut, yes.
@hg - "The CPU performance progress is a joke"
I wouldn't say that. CPU performance has increased where there is demand for it to increase - which is for server CPUs and mobile CPUs, not desktop CPUs. In the server realm, during the past 4ish years we have gone from having 8 core Xeons to 24 core Xeons. Mobile/low-power CPUs have advanced rapidly with the boom of smartphones, tablets, smart TVs/Refrigerators/Toasters/IoT.
Desktop CPUs haven't seen a huge increase in performance or core count because there just isn't the demand there to drive that growth. Even the i7-3770 is more compute power than most users need.
@Kevin, true they have moved from 8 core Xeon to 24 core, but performance per core is roughly the same. There is only so much scaling that you can do without overcomplicating your design, so as long as you can keep all those cores busy probably the issues are going to be somewhere else. However, more cores involve also more cache sharing at the L3 level which dominates the cost on any decently optimized software. We are cache bound on several places.
The biggest Xeon you can buy today is a 28 core with 38.5Mb of total L3 cache size. Roughly speaking when everything is running at full speed (with much luck) you get roughly 1.3Mb of cache per core (which is not dedicated anyways). Therefore, even if you can keep them busy on general workloads (aka non-number-crunching) most of the time ends up being servicing cache misses (busy-work). With faster CPUs at least you have less shared cache space per core.
Comment preview