Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

Posts: 7,116 | Comments: 49,943

filter by tags archive
time to read 7 min | 1238 words

I run into the following twitter, which list some of Parler’s requirements (using the upper limits specified):

  • Scylla cluster – 40 nodes with 64 cores, 512GB RAM, 14TB NVME drives for each node. For a total of 2,560 cores and 20TB RAM, 560 TB of disks.
  • PostgreSQL cluster – 100 nodes with 96 cores, 768 GB RAM and 4 TB NVME. For a total of 9,600 cores, 75 TB RAM and 400 TB of disks.
  • 400 application instances – 16 cores & 64 GB RAM.

Their internal traffic is about 6.6 GB / sec and their external traffic is about 2 GB / sec. There is a lot of interesting discussion on the twitter feed on these numbers, but I thought that it would be interesting to see how much it would cost to build that.

The 64 Cores & 512 GB RAM can be handled via Gigabyte R282-Z90, the given specs says that a single one would cost 27,000 USD. That means that the Scylla cluster alone would be about a million dollar, but I haven’t even touched on the drives. I couldn’t find a 14 TB NVMe drive in a cursory search, but a 15.36TB drive (Micron 9300 Pro 15.36TB NVMe) costs 2,500 USD per unit. That makes the cost of the hardware alone for the Scylla cluster at 1.15 million USD.

I would expect about twice that much for the PostgreSQL cluster, for what it’s worth. For the application servers, that is a lot less, with about a 4,000 USD cost per instance. That comes to another 1.6 million USD.

Total cost is roughly 5 million USD, and we aren’t talking about the other stuff (power, network, racks, etc). I’m not a hardware guy, mind! I’m probably missing a lot of stuff. At that size, you can likely get volume discounts, but I’m missing that the stuff that I’m missing would cost quite a lot as well. Call it a minimum of 7.5 million USD to setup a data center with those numbers. That does not include labor and licensing costs, I want to add.

Also, note that that kind of capacity is likely something that you can’t just get from anyone but the big cloud providers with a quick turnaround basis. I’ll estimate that this is a multiple months just to order the parts, to be honest.

In other words, they are going to be looking at a major financial commitment and some significant lead time.

Then again… Given their location in Henderson, Nevada, the average developer salary is 77,000 USD per year. That means that the personal cost, which is typically significantly higher than any other expense, is actually not that big. As of Non 2020, they had about 30 people working for Parler, assuming all of them are developers paid 100,000 USD a year (significantly higher than the average salary in their location), the employment costs of the entire company would likely be under half of the cost of the hardware required.

All of that said…. what we can really see here is a display of incompetency. At the time it was closed, Parler has roughly 15 – 20 million users. A lot of them were recently registrations, of course, but Parler already experience several cases of high number of user registrations in the past. In June of 2020 it saw 500,000 users registering to its services within 3 days, for example.

Let’s take the 20 million users as the number of users, and assume that all of them are in the states and have the same hours of activity. We’ll further assume that we have high participation numbers and all of those users are actively viewing. Remember the 1% rule, only a small minority of users are actually generating content on most platforms. The vast majority are silent observers. That would give us roughly 200,000 users that generate content, but even then, not all content is made equal. We have posts and comments, basically, and treating them differently is a basic part of building efficient system.

On Twitter, Katy Perry has just under 110 million followers. Let’s assume that the Parler ecosystem was highly interconnected and most of the high profile accounts would be followed by the majority of the users. That means that the top 20,000 users will be followed by all the other 20 millions. The rest of the 180,000 users that active post will likely do so in reaction, not independently, and have comparatively smaller audiences.

Now, we need to estimate how much these people will post. I looked at Dave Weigel’s account (591.7K followers), covering politics for Washington Post. I’m writing this on Jan 20, so the Biden inauguration takes place. I’m assuming that this is a busy time for political correspondents.  Looking at his twitter feed, he posted 3,220 tweets this month and Jan 6, which had a lot to report on, had 377 total tweets. Let’s take 500 as the reasonable upper bound for the number of interactions of most of the top users in the system, shall we?

That means that we have:

  • 20,000 high profiler users.
  • Each posting to a max of 500 a day.
  • Let’s assume that this all happens in 8 hours, instead of over the entire day.
  • That translates to roughly 1,250,000 posts an hour. If we express this in terms of posts per second, that comes to 348 posts per second.

Go and look at the specs above. Using these metrics, you can dedicate a machine for each one of those posts. Given the number of cores requested for application instances (400 x 16 = 6400 cores), this is beyond ridiculous.

Just to give you some context, when we run benchmarks of RavenDB, we run it on a Raspberry Pi 3. That is a 25$ machine, with a shady power supply and heating issues. We were able to reach over 1,000 writes / second on a sustained basis. Now, that is for simple writes, sure, but again, that is a Raspberry Pi doing three times as much as we would need to handle Parler’s expected load (which I think I was overestimating).

This post is getting a bit long, but I want to point out another social network Stack Exchange (Stack Overflow), with 1.3 billion page views per month (assuming perfect distribution, roughly 485 page views per second, each generating multiple requests).

  • Their web servers handle 450 req/sec at peak across 9 web servers (Max of 4,050 req/sec) with peak CPU usage of 12%.
  • 2 SQL Server clusters with 4 machines in total. Handling an aggregate of 23800 queries / sec with peak CPU usage of 15%.
  • Render time across the board of < 20 ms.

The hardware that is used for those servers:

  • 9 Web - 48 cores + 64 GB RAM
  • 4 DB – 32 cores + 768 GB RAM

There are a few other type of servers there, and I recommend looking into the links, because there is a lot of interesting details there.

The key here is that they are running top 200 site in significantly less hardware, and are able to serve requests and provide great quality of service.

To be fair, Stack Overflow is a read heavy site, with under half a million questions and answers in a month. In other words, less than 0.04% of the views generate a write. That said, I’m not certain that the numbers would be meaningfully different in other social media platforms.

In my next post, I’m going to explore how you can build a social media platform without going bankrupt.

time to read 2 min | 333 words

I am really proud with the level of transparency and visibility that RavenDB gives out of the box to its users. The server dashboard gives you all the critical information about what a node is doing and can serve as a great mechanism to tell at a glance what is the health of a node.

A repeated customer request is to take that up a notch. Not just showing a single server status, but showing the entire cluster state. This isn’t a replacement for a full monitoring solution, but it is meant to provide you with a quick overview of exactly what is going on in your cluster.

I want to present some early prototypes of how we are thinking about showing this information, and I wanted to get your feedback about those, as well as any other information that you think should be relevant for the cluster dashboard.

Here is the resource utilization portion of the dashboard. We aren’t settled yet on the graphs, but we’ll likely have CPU usage and memory (including memory breakdowns).

image

Some of this information may be hidden by default, and you can expand it:

image

You can get more details about what the cluster is doing here:

image

And finally, the overall view of task assignment in the cluster:

image

You can also drill down to a particular server status:

image

This is early stages yet, we pretty much have just the mockup, so this is the right time to ask for what you want to see there.

time to read 1 min | 165 words

Among the advantages of a highly distributed system with endless edge points are that you can outsource data collection to a universe of locations, and even include them in your workflow, thereby expanding your operations. The challenges are when you have endpoints that contribute to your organization and systems, but you don’t exactly trust. They can be newcomers that you don’t know enough about, or entities with a history of misusing the data inclusion to your systems give them access to. You want the value they create, the information they amass and gather to be copied from the edge up the levels of your system, but you don’t want to give too much for that value or pay for it in the form of greater risk. Filtered replication is the art of enabling nontrusted edge points to access your system in a limited manner, replicating the information they produce in a nontrusted format.


time to read 4 min | 636 words

Yesterday I posted about Parler banning and the likely impact of that, both legally and in terms of the technical details. My expectations is that new actors will step in to fill the existing demand created by the current social network account suspensions. I had spent some time thinking about the likely effects of this, and I think that it will lead to some interesting results.

A new social network will very likely rise as a result of those actions. That network would have to be resilient for de-platforming issues. That means that it cannot assume that it can run on any of the cloud services, at least not as normally understood by today’s standards. That means that we are likely to see one of two options:

  • Fully distributed systems – independent nodes collaborating with one another to create a network. Each node may be host and operated independently. Similar to how torrents work and other fully distributed P2P systems.
  • Distributed infrastructure – a set of servers that are running on behalf of a single entity, but are spread over multiple vendors and locations. The idea is that the shutdown of a single or multiple vendors will have little impact, because of distribution of effort.

The first option is probably something like Mastodon, but I would really like to see a return to blogs & RSS as the preferred social network. That has the advantage of a true distributed model without a single controlling actor. It is also much lower cost in terms of technology and complexity. Discovery of new blogs can be handled via recommendations, search, etc.

The reason I prefer this option is that I like to blog Smile. More seriously, owning your own content and distribution platform has just become quite important. A blog is about as simple a piece of software as you can imagine. Consuming blogs is an act that require no publication of personal information, no single actor that can observe everything you do, etc.

I don’t know if this will be the direction, although it is my favorite one. It is possible that we’ll end up with Mastodon empire, with many actors creating networks of servers which may or may not be interconnected. I can see a future where you’ll have a network of dog owners vs. cat owners, but the two aren’t federated and there are isolated discussions between them.

Given that you could create links from one to the other, I don’t think we have to deal with total echo chambers. Consider a post in the cats social network: The dog owners are talking about the chore of having to go for walks at “dogs://social.media/walks-are-great”, that is so high maintenance, the silly buggers. 

That would create separate communities, with their own rules and moderation. Consider this something like subreddits, but without the single organization that can enforce global rules.

The other alternative is that a social network would rise with a truly distributed backend that is resilient to de-platforming issues. From an outside perspective, this will present as something to the existing social networks. That has the advantage of requiring the least from users, but it is a non trivial technical challenge.

I prefer the first option, but I believe it is more likely we’ll end up with the second. The reason for that is monetization strategies. If you have a many different actors cooperating to create a network, there is a question on how you pay for that. The typical revenue model for social network is advertising. That doesn’t work so well where there isn’t a single actor that can sell ads (and track users).

That said, it would be much faster and easier to get started with the first option and it may be that we’ll end up there with the force of inertia.

time to read 5 min | 927 words

I’m writing this post at a time when Donald Trump’s social media accounts were closed by Twitter, Facebook and across pretty much all popular social networks. Parler, an alternative social network has been kicked off the Apple and Google app stores and its AWS account was closed. It also appears that many vendors are dropping it and it will take significant time to get back online, if it is able to do so.

I’m not interested in talking about the reasons for this, mind. This is a hot topic political issue, in a country I’m not a citizen of, and I have no interest in getting bogged down with the political details.

I wish I could say that I had no dog in this fight, but I suspect that the current events will have a long term impact on the digital world. Note that all of those actions are taken by private companies, on their own volition. In other words, it isn’t a government or the courts demanding this behavior, but the companies’ own decision making process.

The thing that I can’t help thinking is that the current behavior by those companies is direct, blatant and very much short sighted. To start with, all of those companies are working on global scale, and they have just proven that they are powerful enough to rein in the President of the Unites States. Sure, he is at a lame duck status currently, but that is still something that upset the balance of power.

The problem with that is that while it would appear that the incoming US administration is favorable to this course of action, there are other countries and governments that are looking at this with concern. Poland is on track to pass a law prohibiting the removal of posts in social media that do not break local laws. Israel’s parliament is also considering a similar proposal.

In both cases, mind, these proposed laws got traction last year, before the current escalation in this behavior. I feel that more governments will now consider such laws in the near future, given the threat level that this represent to them. A politician is this day and age that doesn’t use social media to its fullest extent is going to be severely hampered. Both the Obama and the Trump campaigns were lauded for their innovative use of social media, for example.

There are also other considerations to ponder. One of the most costly portions of running a social network is the monitoring and filtering of posts. You have to take into account that people will post bile, illegal and obscene stuff. That’s expensive, and one of the reasons for vendors dropping of Parler was their moderation policies. That means that there is a big and expensive barrier in place for future social networks that try to grow.

I’m not sure how this is going to play out in the short term, to be honest. But in the long term, I think that there is going to be a big push, both legally and from a technical perspective to fill those holes. From a legal perspective, I would expect that many lawyers will make a lot of money on the fallout from the current events, just with regards to the banning of Parler.  I expect that there are going to be a whole lot of new precedents, both in the USA and globally.

From a technical perspective, the technology to run a distributed social network exists. Leave aside currently esoteric choices such as social network on blockchain (there appears to be a lot of them, search for that, wow!), people can fall back to good old Blog & RSS to get quite a bit of traction. It wouldn’t take much to something that looks very similar to current social networks.

Consider RSS Bandit or Google Reader vs. Twitter or Facebook. There isn’t much that you’ll need to do to go from aggregation of RSS feeds to a proper social network. One advantage of such a platform, by the way, is that it allows (and encourage) thought processes that are longer than 140 characters. I dearly miss the web of the 2000s, by the way.

Longer term, however, I would expect a rise of distributed social networks that are composed of independent but cooperating nodes (yes, I’m aware of Mastodon, and I’m familiar with Gab breaking out of that). I don’t know if this will be based on existing software or if we’ll end up with new networks, but I think that the die has been cast in this regard.

That means that the next social network will have to operate under assumed hostile environment. That means running on multiple vendors, taking no single point of failure, etc.

The biggest issue with getting a social network off the ground is… well, network effects. You need enough people in the network before you start getting more bang for the buck. But right now, there is a huge incentive for such a network, given the migration of many users from the established networks.

Parler’s app has seen hundreds of thousands of downloads a day in the past week, before it was taken down from the app stores. Gab is reporting 10,000+ new users an hour and more users in the past two days than they had seen in the past two years.

There is a hole there that will be filled, I think. Who will be the winner of all those users, I don’t know, but I think that this will have a fundamental impact on the digital world.

time to read 2 min | 221 words

On an otherwise uneventful morning, the life of the operations guy got… interesting.

What were supposed to be a routine morning got hectic because the database refused to operate normally. To be more exact, the database refused to load a file. RavenDB is generally polite when it run into issues, but this time, it wasn’t playing around. Here is the error it served:

---> System.IO.IOException: Could not set the size of file  D:\RavenData\Databases\Purple\Raven.voron to 820 GBytes

---> System.ComponentModel.Win32Exception (665): The requested operation could not be completed due to a file system limitation

Good old ERROR_FILE_SYSTEM_LIMITATION, I never knew you, because we have never run into an error with this in the past.

The underlying reason was simple, we had a large file (820GB) that was too fragmented. At some point, the number of fragments of the file bypassed the maximum size of the file system.

The KB article about this issue is here. You might be able to move forward more quickly by using the contig.exe tool to defrag a single file.

The root cause here was probably backing up to the same drive as the database, which forced the file system to break the database file into fragements.

Just a reminder that there are always more layers into the system and that we need to understand them all when they break.

time to read 4 min | 705 words

I want to comment on the following tweet:

When I read it, I had an immediate and visceral reaction. Because this is one of those things that sound nice, but is actually a horrible dystopian trap. It confused two very important concepts and put them in the wrong order, resulting in utter chaos.

The two ideas are “writing tests” and “producing high quality code”. And they are usually expressed in something like this:

We write tests in order to product high quality code.

Proper tests ensure that you can make forward progress without having regressions. They are a tool you use to ensure a certain level of quality as you move forward. If you assume that the target is the tests and that you’ll have high quality code because of that, however, you end up in weird places. For example, take a look at the following set of stairs. They aren’t going anywhere, and aside from being decorative, serves no purpose.

image

When you start considering tests themselves to be the goal, instead of a means to achieve it, you end up with decorative tests. They add to your budget and make it harder to change things, but don’t benefit the project.

There are a lot of things that you are looking for in code review that shouldn’t be in the tests. For example, consider the error handling strategy for the code. We have an invariant that says that exceptions may no escape our code when running in a background thread. That is because this will kill the process. How would you express something like that in a test? Because you end up with an error raised from a write to a file that happens when the disk is full that kills the server.

Another example is a critical piece of code that needs to be safely handle out of memory exceptions. You can test for that, sure, but it is hard and expensive. It also tend to freeze your design and implementation, because you are now testing implementation concerns and that make it very hard to change your code.

Reviewing code for performance pitfalls is also another major consideration. How do you police allocations using a test? And do that without killing your productivity? For fun, the following code allocates:

Console.WriteLine(1_000_000);

There are ways to monitor and track these kind of things, for sure, but they are very badly suited for repeatable tests.

Then there are other things that you’ll cover in the review, more tangible items. For example, the quality of error messages you raise, or the logging output.

I’m not saying that you can’t write tests for those. I’m saying that you shouldn’t. That is something that you want to be able to change and modify quickly, because you may realize that you want to add more information in a certain scenario. Freezing the behavior using tests just means that you have more work to do when you need to make the changes. And reviewing just test code is an even bigger problem when you consider that you need to consider interactions between features and their impact on one another. Not in terms of correctness, you absolutely need to test that, but in terms of behavior.

The interleaving of internal tasks inside of RavenDB was careful designed to ensure that we’ll be biased in favor of user facing operations, starving background operations if needed. At the same time, it means that we need to ensure that we’ll give the background tasks time to run. I can’t really think about how you would write a test for something like that without basically setting in stone the manner in which we make that determination. That is something that I explicitly don’t want to do, it will make changing how and what we do harder. But that is something that I absolutely consider in code reviews.

time to read 2 min | 287 words

We got an interesting question a few times in recent weeks. How can I manually create a document revision with RavenDB? The answer for that is that you can use the ForceRevisionCreationFor() method to do so. Here is how you’ll typically use this API:

This is the typical expected usage for the API. We intended this to make it so users can manually triggers revisions, for example, when moving a document from draft mode to public and the like.

It turns out that there is another reason to want to use this API, when you migrate data to RavenDB and want to create historical revisions. The API we envisioned isn’t suitable for this, but the layered API in RavenDB means that we can still get the desired behavior.

Here is how we can achieve this:

Basically, we manually create the transaction steps that will run on the server, and we can apply the command to the same document multiple times.

Note that RavenDB requires a document to create a revision from it, so we set the document, create a revision and overwrite the document again, as many times as we need to.

Another issue that was brought up is that the @last-modified property on the document is set to the date of the revision creation. In some cases, they want to do this to get the revision to be created at the right time it was originally created during the migration period.

That is not supported by RavenDB, because the @last-modified is tracking the time that RavenDB modified the document or revision. If you need to track the time a document was modified in the domain, you need to keep that as part of your actual domain model.

FUTURE POSTS

  1. Building a social media platform without going bankrupt: Part I–Laying the numbers - 35 minutes from now
  2. Building a social media platform without going bankrupt: Part II–Accepting posts - about one day from now
  3. Building a social media platform without going bankrupt: Part III–Reading posts - 2 days from now
  4. Building a social media platform without going bankrupt: Part IV–Caching and distribution - 3 days from now
  5. Building a social media platform without going bankrupt: Part V–Handling the timeline - 4 days from now

And 5 more posts are pending...

There are posts all the way to Feb 05, 2021

RECENT SERIES

  1. Webinar recording (12):
    15 Jan 2021 - Filtered Replication in RavenDB
  2. Production postmortem (30):
    07 Jan 2021 - The file system limitation
  3. Open Source & Money (2):
    19 Nov 2020 - Part II
  4. re (27):
    27 Oct 2020 - Investigating query performance issue in RavenDB
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats