Looking at Parler specs and their architecture
I run into the following twitter, which list some of Parler’s requirements (using the upper limits specified):
- Scylla cluster – 40 nodes with 64 cores, 512GB RAM, 14TB NVME drives for each node. For a total of 2,560 cores and 20TB RAM, 560 TB of disks.
- PostgreSQL cluster – 100 nodes with 96 cores, 768 GB RAM and 4 TB NVME. For a total of 9,600 cores, 75 TB RAM and 400 TB of disks.
- 400 application instances – 16 cores & 64 GB RAM.
Their internal traffic is about 6.6 GB / sec and their external traffic is about 2 GB / sec. There is a lot of interesting discussion on the twitter feed on these numbers, but I thought that it would be interesting to see how much it would cost to build that.
The 64 Cores & 512 GB RAM can be handled via Gigabyte R282-Z90, the given specs says that a single one would cost 27,000 USD. That means that the Scylla cluster alone would be about a million dollar, but I haven’t even touched on the drives. I couldn’t find a 14 TB NVMe drive in a cursory search, but a 15.36TB drive (Micron 9300 Pro 15.36TB NVMe) costs 2,500 USD per unit. That makes the cost of the hardware alone for the Scylla cluster at 1.15 million USD.
I would expect about twice that much for the PostgreSQL cluster, for what it’s worth. For the application servers, that is a lot less, with about a 4,000 USD cost per instance. That comes to another 1.6 million USD.
Total cost is roughly 5 million USD, and we aren’t talking about the other stuff (power, network, racks, etc). I’m not a hardware guy, mind! I’m probably missing a lot of stuff. At that size, you can likely get volume discounts, but I’m missing that the stuff that I’m missing would cost quite a lot as well. Call it a minimum of 7.5 million USD to setup a data center with those numbers. That does not include labor and licensing costs, I want to add.
Also, note that that kind of capacity is likely something that you can’t just get from anyone but the big cloud providers with a quick turnaround basis. I’ll estimate that this is a multiple months just to order the parts, to be honest.
In other words, they are going to be looking at a major financial commitment and some significant lead time.
Then again… Given their location in Henderson, Nevada, the average developer salary is 77,000 USD per year. That means that the personal cost, which is typically significantly higher than any other expense, is actually not that big. As of Non 2020, they had about 30 people working for Parler, assuming all of them are developers paid 100,000 USD a year (significantly higher than the average salary in their location), the employment costs of the entire company would likely be under half of the cost of the hardware required.
All of that said…. what we can really see here is a display of incompetency. At the time it was closed, Parler has roughly 15 – 20 million users. A lot of them were recently registrations, of course, but Parler already experience several cases of high number of user registrations in the past. In June of 2020 it saw 500,000 users registering to its services within 3 days, for example.
Let’s take the 20 million users as the number of users, and assume that all of them are in the states and have the same hours of activity. We’ll further assume that we have high participation numbers and all of those users are actively viewing. Remember the 1% rule, only a small minority of users are actually generating content on most platforms. The vast majority are silent observers. That would give us roughly 200,000 users that generate content, but even then, not all content is made equal. We have posts and comments, basically, and treating them differently is a basic part of building efficient system.
On Twitter, Katy Perry has just under 110 million followers. Let’s assume that the Parler ecosystem was highly interconnected and most of the high profile accounts would be followed by the majority of the users. That means that the top 20,000 users will be followed by all the other 20 millions. The rest of the 180,000 users that active post will likely do so in reaction, not independently, and have comparatively smaller audiences.
Now, we need to estimate how much these people will post. I looked at Dave Weigel’s account (591.7K followers), covering politics for Washington Post. I’m writing this on Jan 20, so the Biden inauguration takes place. I’m assuming that this is a busy time for political correspondents. Looking at his twitter feed, he posted 3,220 tweets this month and Jan 6, which had a lot to report on, had 377 total tweets. Let’s take 500 as the reasonable upper bound for the number of interactions of most of the top users in the system, shall we?
That means that we have:
- 20,000 high profiler users.
- Each posting to a max of 500 a day.
- Let’s assume that this all happens in 8 hours, instead of over the entire day.
- That translates to roughly 1,250,000 posts an hour. If we express this in terms of posts per second, that comes to 348 posts per second.
Go and look at the specs above. Using these metrics, you can dedicate a machine for each one of those posts. Given the number of cores requested for application instances (400 x 16 = 6400 cores), this is beyond ridiculous.
Just to give you some context, when we run benchmarks of RavenDB, we run it on a Raspberry Pi 3. That is a 25$ machine, with a shady power supply and heating issues. We were able to reach over 1,000 writes / second on a sustained basis. Now, that is for simple writes, sure, but again, that is a Raspberry Pi doing three times as much as we would need to handle Parler’s expected load (which I think I was overestimating).
This post is getting a bit long, but I want to point out another social network Stack Exchange (Stack Overflow), with 1.3 billion page views per month (assuming perfect distribution, roughly 485 page views per second, each generating multiple requests).
- Their web servers handle 450 req/sec at peak across 9 web servers (Max of 4,050 req/sec) with peak CPU usage of 12%.
- 2 SQL Server clusters with 4 machines in total. Handling an aggregate of 23800 queries / sec with peak CPU usage of 15%.
- Render time across the board of < 20 ms.
The hardware that is used for those servers:
- 9 Web - 48 cores + 64 GB RAM
- 4 DB – 32 cores + 768 GB RAM
There are a few other type of servers there, and I recommend looking into the links, because there is a lot of interesting details there.
The key here is that they are running top 200 site in significantly less hardware, and are able to serve requests and provide great quality of service.
To be fair, Stack Overflow is a read heavy site, with under half a million questions and answers in a month. In other words, less than 0.04% of the views generate a write. That said, I’m not certain that the numbers would be meaningfully different in other social media platforms.
In my next post, I’m going to explore how you can build a social media platform without going bankrupt.
Comments
100 nodes with 96 cores == For a total of 9_600 cores
100% agree. Funnily enough, even before I reached middle of your post I was re-checking the hardware Stack Overflow was running on. And I took it as a bit ridiculous example guessing SO is way busier that Parler was. I counted 23 app/web/app machines in SO vs 540 in Parler, I didn't even bother comparing RAM and cores at that point. Parler is indeed an example of a huge waste of resources.
Karg,
I'm sorry, you are correct. I guess that I didn't really follow those numbers, too ridiculous
Looks like it didn't need to be taken down, it would just collapse by itself.
Seems very likely that they have overstated what their infrastructure really was.
Yep, everything gets so twisted when politics start meddling in. For surethey never got as much publicity as now when they had been shut down, and without a good way to verify they can now inflate the numbers at will. Maybe it's not in my bag of interests but i seriously never heard a word about Parler before - did you?
Tyler,
Yes, it would make sense (given that they are expecting big growth). However, the stated requirements are orders of magnitude higher than I would expect.
Rafal, "NEED" to be taken down is an interesting way to phrase that. I had heard of Parler before all this latest dust up and I was an active user. Why? Because my twitter account was suspended after I posted a link to an article that questioned the NOAA adjustments to raw temperature data that conveniently resulted in a trend line that matched their previous predictions. Parler didn't hire an army of millennial wokesters to monitor every post in order to censure free speech and free exchange of ideas. A critique of Parler infrastructure is valid and educational. Shutting them down for not censoring posts would make Pravda proud.
Rockster, i think they made a mistake by getting mixed into politics and assosciated with particularly ugly world views. Then they became an easy target in some political power play, and their business got destroyed. I dunno, maybe Twitter is not so friendly place for all, but i think some reason for censorship is that they want to avoid building these mutual admiration societies around crazy, extremist or toxic ideologies as this would destroy their business too.
“ they made a mistake by getting mixed into politics and assosciated with particularly ugly world views”
You mean Twitter?
In Parler specs, does it includes other environments like staging?
Ygal,
I would assume so, I can't imagine that they need even more
Comment preview