Ayende @ Rahien

filter by tags archive

architecture (616) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1088) rss
raven (1457) rss
ravendb.net (541) rss
reviews (184) rss

2025
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Think inside the database - RavenDB with native GenAI integration

Feb 05 2021

Building a social media platform without going bankruptPart X–Optimizing for whales

time to read 5 min | 925 words

Tweet Share Share 4 comments

Tags:

Unless I get good feedback / questions on the other posts in the series, this is likely to be the last post on the topic. I was trying to show what kind of system and constraints you have to deal with if you wanted to build a social media platform without breaking the bank.

I talked about the expected numbers that we have for the system, and then set out to explain each part of it independently. Along the way, I was pretty careful not to mention any one particular technological solution. We are going to need:

Caching
Object storage (S3 compatible API)
Content Delivery Network
Key/value store
Queuing and worker infrastructure

Note that the whole thing is generic and there are very little constraints on the architecture. That is by design, because if your architecture can hit the lowest common denominator, you have a lot more freedom. Instead of tying yourself to a particular provider, you have a lot more freedom. For that matter, you can likely set things up so you can have multiple disparate providers without too much of a hassle.

My goal with this system was to be able to accept 2,500 posts per second and to handle reads of 250,000 per second. This sounds like a lot, but a most of the load is meant to be handled by CDN and the infrastructure, not the core servers. Caching in a social network is somewhat problematic, since you’ll have a lot of the work is obviously personalized. That said, there is still quite a lot that can be cached, especially the more popular posts and threads.

If we’ll assume that only about 10% of the reading load hits our servers, that is 25,000 reads per second. If we have just 25 servers for handling this (assuming five each in five separate data centers) we can accept the load at 1,000 requests per second. On the one hand, that is a lot, but on the other hand…. most of the cost is supposed to be about authorization, minor logic, etc. We can also at this point add more application servers and scale linearly.

Just to give some indication of costs, a dedicated server with 8 cores & 32 GB disk will cost 100$ a month, and there is no charge for traffic. Assuming that I’m running 25 of these, that will cost me 2,500 USD a month. I can safely double or triple that amount without much trouble, I think.

Having to deal with 1,000 requests per server is something that requires paying attention to what you are doing, but it isn’t really that hard, to be frank. RavenDB can handle more than a million queries a second, for example.

One thing that I didn’t touch on, however, which is quite important, is the notion of whales. In this case, a whale is a user that has a lot of followers. Let’s take Mr. Beat as an example, he has 15 million followers and is a prolific poster. In our current implementation, we’ll need to add to the timeline of all his followers every time that he posts something. Mrs. Bold, on the other hand, has 12 million followers. At one time Mr. Beat and Mrs. Bold got into a post fight. This looks like this:

Mr. Beat: I think that Mrs. Bold has a Broccoli’s bandana.
Mrs. Bold: @mrBeat How dare you, you sniveling knave
Mr. Beat: @boldMr2 I dare, you green teeth monster
Mrs. Bold: @mrBeat You are a yellow belly deer
Mr. Beat: @boldMr2 Your momma is a dear

This incredibly witty post exchange happened during a three minute span. Let’s consider what this will do, given the architecture that we outlined so far:

Post #1 – written to 15 million timelines.
Post #2 - 5 – written to the timelines of everyone that follows both of them (mention), let’s call that 10 million.

That is 55 million timeline writes to process within the span of a few minutes. If other whales also join in (and they might) the number of writes we’ll have to process will sky rocket.

Instead, we are going to take advantage of the fact that only a small number of accounts are actually followed by many people. We’ll place the limit at 10,000 followers. At which point, we’ll no longer process writes for such accounts. Instead, we’ll place the burden at the client’s side. The code for showing the timeline will then become something like this:

In other words, we record the high profile users in the system and instead of doing the work for them on write, we’ll do that on read. The benefit of doing it in this manner is that the high profile users tiimeline reads will have very high cache utilization.

Given that the number of high profile people you’ll follow are naturally limited, that can save quite a lot of work.

The code above can be improved, of course, there are usually a lot of difference in the timeline posts, so we may have a high profile user that is off for a day or two, they shouldn’t show up in the current timeline and can be removed entirely. You need to do a bit more work around the time frames as well, which means that timeline should also allow us to query itself by most recent post id, but that is also not too hard to implement.

And with that, we are at the end. I think that I covered quite a few edge cases and interesting details, and hopefully that was interesting for you to read.

As usual, I really appreciate any and all feedback.

Feb 04 2021

Building a social media platform without going bankruptPart IX–Dealing with the past

time to read 3 min | 551 words

Tweet Share Share 0 comments

Tags:

A social media platform has to deal with the concept of now and its history. For the most part, most users are interacting with the current state of the system. Looking at their timeline, watching current posts, etc. At the same time, there is a wealth of information that you can get from looking at the past.

It isn’t out of the question that you’ll have users diving into the history of posts of another user going as far back as possible. That can be a parent whose kids just left the house, looking at baby pictures or it can be a new friend, trying to learn some interesting tidbits before a party (when we still had those).

It can also be automated processes, such as: “5 years ago you posted…”

The architecture that I presented in these posts is relatively agnostic for such a scenario. Given the timeline feature, going back in time means that you can fairly easily discriminate based on age. Older sections in the timeline can be moved to lower class storage tier (basically, move to HDD instead of NVMe, for example). They are still accessible, still available, but far cheaper to store.

I don’t believe that you can usually go with an archive tier level for the timelines, not unless you are willing to effectively be unable to access them if a user requests it, but a policy of moving old and rarely used timeline sections and posts to HDD is absolutely doable. Note that things like intelligent tiering is not a good solution for our needs. That would move items based on age and access, but while we want to move items by age, older items are still access, just far more rarely, so we don’t want to move them back into hot storage if they are rarely accessed.

That said, certain posts are likely to generate active for a long time. So we can’t just send data to cold storage just based on age. Need to also take into account the recent access patterns. On the other hand, consider a post a few years ago that talks about Broccoli, when people still did that. Mr. Beat discovers that Mrs. Bold has such a post and blast it all over social media. Very quickly that old post become very active. That means that we should have a way to move data back to hot storage if there is enough access.

Ideally, we can rely on the underlying storage to do that for us, but we have to know how it actually works behind the scenes and understand what is actually going on there. The nice thing about this is that unlike most of the details we discussed so far, that is something that we can punt down the road, we already have the architecture in place that will allow us to introduce this cost savings measure down the line, we don’t have to have it figured out from day one. Given the fact that we have multi level caches, that means that we can probably just age out old information to cold storage and not usually have to think about it too much.

When we have enough data that this is a serious concern, on the other hand… we will have the time and resources to also handle it.

Feb 03 2021

Building a social media platform without going bankruptPart VIII–Tagging and searching

time to read 3 min | 555 words

Tweet Share Share 0 comments

Tags:

Quite a few of the features that we consider native to social media actually came about as a result of users’ behavior, not pre-planned actions. For example, the #tagging and @mentions were both created by users and then adopted as an official action by the social media giants.

I already touched on how mentioned are handled, as part of writing the document itself, we insert the post id to the timeline of the mentioned user. For tagging, the process is very similar. Each tag has a timeline, and we can insert posts into the tag’s timeline. From there on, the process is basically identical for what we already describe.

I want to stop for a second and emphasis the coolness factor of a significant feature being handle (in the backend) via simply:

There is probably UI work to do here, but that is roughly all you’ll need to manage tags. Presumably you’ll want some better policies, but that is the core behavior you’ll need to support this sort of feature.

What about searching? Full search indexes are nothing new and you can get an off the shelve solution to manage your searches easily enough. That is likely to be one of the annoying pieces of the system. Luckily, we can usually handle things by offering two tiers of searching. We have the first tier, which cover posts in the recent past (a month or two) which must have very high speed queries and then we have full data search, for which we have a lot longer SLA. By far most queries are going to be hitting the recent data set, which makes the task itself easier. The actual choice of indexing solution and its usage is fairly irrelevant at this point. You’ll need something that is distributed, but there is enough variety there that you can get away with selecting pretty much anything.

We aren’t going to need to provide sophisticated full text search features, we just want users to be able to find results by text queries.

You’ll note that throughout this series of posts, I’m not trying to find novel ways to get the best solution. I’m using practical options for the actual use case presented and in many cases, I can get away with a lot by changing the requirements just slightly. For that matter, a lot of the limitations that I accept are real limitations that you’ll find with other social media networks as well.

Finally, I just wanted to show how we can enable basic search capability using minimal amount of code, given the infrastructure we have so far:

As you can see, we built a simple full text search here. To query it, you get the timeline for a particular term and get the list of post that it has.

For tags and searches, as you can imagine, this can be a huge list, which is partly why timelines are built on the concept of sections that can be so easily distributed.

The solution above isn’t actually a good one for full text search. I can’t easily turn that into a search by a phrase, and there are many other features that I’ll likely want to have, but that is a good example of how the infrastructure that we built for one part of the system can be utilized for completely different purpose.

Feb 02 2021

Building a social media platform without going bankruptPart VII–Counting views, replies and likes

time to read 4 min | 623 words

Tweet Share Share 0 comments

Tags:

I touched briefly on the issue of posts statistics in a previous post, but it deserve its own post. There are all sort of metrics that we want to track on a post. Here are just a few of them:

Unlike most of the items that we discussed so far, these details are going to be very relevant for both reads and writes. In particular, it is very common for these numbers to be update concurrently, especially when talking about the popular posts. At the simplest level, these can be represented as a map<key, int64>. That gives us the maximum flexibility for our needs and can be also utilized in the future for additional use cases.

Given that this is effectively a distributed counter problem, there are all sort of ways that we can handle this. At the client level, we send the increment operation to the server and manually update the value. That gets us 90% there in terms of the UX factors, but there is a lot to handle this behind the scenes.

A good algorithm to use for this is the PN Counters model from the CRDT playbook. RavenDB implements these for you, for example. In essence, that means that we have the following data model:

The likes and replies object has a property per each node that increment a value. That contains the value that we have for that node as well as the etag for this change. It is easy to merge such a model between different versions, because we can always take value of the higher etag to get the latest value. In this way, we can allow concurrent and distributed updates across the entire system and it will resolve itself in the end to the right value. Another option may be to push the commands all way to the owning data center, where we’ll apply the operations, but that may add a high load on hot posts in the system. Better to distribute this globally and not really concern ourselves with the matter.

Looking at Twitter, there are about 200 billion tweets a year. That means that we have to be ready for quite a few of those values. Having that in a dedicated system is a good idea, since it has far different read & write skew than other parts of our system. As part of reading of posts, however, we’ll likely want to build some mechanism for pushing those counters to the post itself so we can remove that from the rest of the system. An easy way to handle that is to do some on an hourly basis. So instead of the format above, we’ll have:

Here we have the last two hours of updates of operations on the post. Once every hour we’ll consolidate all the updates from two hours ago and write them to the post itself. When we get to the point where we have no more updates in the post, we can safely delete the value.

The reason you want to add this complexity is that there is a big difference between all the posts in a social media and the active working set. That tends to be far smaller value and can dramatically reduce the amount of data we need to keep and manage. Assuming that the working set is at 25 millions posts or so across the network seems reasonable, and that amount of data can be easily handle by any server instance you care to use. Managing 200 billion per year, on the other hand, puts us in a different class of problem, and we’ll need more and more resources down the line.

Feb 01 2021

Building a social media platform without going bankruptPart VI–Dealing with edits and deletions

time to read 5 min | 834 words

Tweet Share Share 3 comments

Tags:

In the series so far, we talked about reading and writing posts, updating the timeline and distributing it, etc. I talked briefly about the challenges of caching data when we have to deal with updates in the background, but I haven’t really touched on that. Edits and updates are a pain to handle, because they invalidate the cache, which is one of the primary ways we can scale cheaply.

We need to consider a few different aspects of the problem. What sort of updates do we have in the context of a social media platform? We can easily disable editing of posts, of course, but you have to be able to support deletes. A user may post about Broccoli, which is verboten and we have to be able to remove that. And of course, users will want to be able to delete their own post, let’s their salad tendencies come back to hunt them in the future. Another reason we need to handle updates is this:

We keep track of the interactions of the post and we need to update them as they change. In fact, in many cases we want to update them “live”. How are we going to handle this? I discussed the caching aspects of this earlier, but the general idea is that we have two caching layers in place.

An user getting data will first hit the CDN, which may cache the data (green icon) and then the API, which will get the data from the backend. The API endpoint will query its own local state and other pieces of the puzzle are responsible for the data distribution.

When changes happen, we need to deal with them, like so:

Each change means that we have to deal with a policy decision. For example, a deletion to a post means that we need to go and push an update to all the data centers to update the data. The same is relevant for updates to the post itself. In general, updating the content of the post or updating its view counts aren’t really that different. We’ll usually want to avoid editing the post content for non technical reasons, not for lack of ability to do so.

Another important aspect to take into account is latency and updates. Depending on the interaction model with the CDN, we likely have it setup to cache data based on duration, so API requests are cached for a period of a few seconds to a minute or two. That is usually good enough to reduce the load on our servers and still retain good enough level of updates.

Another advantage that we can use is the fact that when we get to high numbers, we can reduce the update rate. Consider:

We now need to update the post only once every 100,000 likes or shares or once for 10,000 replies. Depending on the rate of change, we can skip that if this happened recently enough. That is the kind of thing that can reduce the load curve significantly.

There is also the need to consider live updates. Typically, that means that we’ll have the client connected via web socket to a server and we need to be able to tell it that a post has been updated. We can do that using the same cache update mechanism. The update cache command is placed on a queue and the web socket servers process messages from there. A client will indicate what post ids it is interested in and the web socket server will notify it about such changes.

The idea is that we can completely separate the different pieces in the system. We have the posts storage and the timeline as one system and the live updates as a separate system. There is some complexity here about cache usage, but it is actually better to assume that stuff will not work than to try for cache coherency.

For example, a client may get an update that a particular post was updated. When it query for the new post details, it gets a notice that it wasn’t modified. This is a classic race condition issue which can case a lot of trouble for the backend people to eradicate. If we don’t try, we can simply state that on the client side, getting a not modified response after an update note is not an error. Instead, we need to schedule (on the client) to query the post again after the cache period elapsed.

A core design tenant in the system is to assume failure and timing issues, to avoid having to force a unified view of the system, because that is hard. Punting the problem even just a little bit allows us a much better architecture.

Oren Eini

Oren Eini

CEO of RavenDB

Building a social media platform without going bankruptPart X–Optimizing for whales

Building a social media platform without going bankruptPart IX–Dealing with the past

Building a social media platform without going bankruptPart VIII–Tagging and searching

Building a social media platform without going bankruptPart VII–Counting views, replies and likes

Building a social media platform without going bankruptPart VI–Dealing with edits and deletions

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed