Building a social media platform without going bankruptPart VIII–Tagging and searching

time to read 3 min | 555 words

Quite a few of the features that we consider native to social media actually came about as a result of users’ behavior, not pre-planned actions. For example, the #tagging and @mentions were both created by users and then adopted as an official action by the social media giants.

I already touched on how mentioned are handled, as part of writing the document itself, we insert the post id to the timeline of the mentioned user. For tagging, the process is very similar. Each tag has a timeline, and we can insert posts into the tag’s timeline. From there on, the process is basically identical for what we already describe.

I want to stop for a second and emphasis the coolness factor of a significant feature being handle (in the backend) via simply:

There is probably UI work to do here, but that is roughly all you’ll need to manage tags. Presumably you’ll want some better policies, but that is the core behavior you’ll need to support this sort of feature.

What about searching? Full search indexes are nothing new and you can get an off the shelve solution to manage your searches easily enough. That is likely to be one of the annoying pieces of the system. Luckily, we can usually handle things by offering two tiers of searching. We have the first tier, which cover posts in the recent past (a month or two) which must have very high speed queries and then we have full data search, for which we have a lot longer SLA. By far most queries are going to be hitting the recent data set, which makes the task itself easier. The actual choice of indexing solution and its usage is fairly irrelevant at this point. You’ll need something that is distributed, but there is enough variety there that you can get away with selecting pretty much anything.

We aren’t going to need to provide sophisticated full text search features, we just want users to be able to find results by text queries.

You’ll note that throughout this series of posts, I’m not trying to find novel ways to get the best solution. I’m using practical options for the actual use case presented and in many cases, I can get away with a lot by changing the requirements just slightly. For that matter, a lot of the limitations that I accept are real limitations that you’ll find with other social media networks as well.

Finally, I just wanted to show how we can enable basic search capability using minimal amount of code, given the infrastructure we have so far:

As you can see, we built a simple full text search here. To query it, you get the timeline for a particular term and get the list of post that it has.

For tags and searches, as you can imagine, this can be a huge list, which is partly why timelines are built on the concept of sections that can be so easily distributed.

The solution above isn’t actually a good one for full text search. I can’t easily turn that into a search by a phrase, and there are many other features that I’ll likely want to have, but that is a good example of how the infrastructure that we built for one part of the system can be utilized for completely different purpose.

More posts in "Building a social media platform without going bankrupt" series:

  1. (05 Feb 2021) Part X–Optimizing for whales
  2. (04 Feb 2021) Part IX–Dealing with the past
  3. (03 Feb 2021) Part VIII–Tagging and searching
  4. (02 Feb 2021) Part VII–Counting views, replies and likes
  5. (01 Feb 2021) Part VI–Dealing with edits and deletions
  6. (29 Jan 2021) Part V–Handling the timeline
  7. (28 Jan 2021) Part IV–Caching and distribution
  8. (27 Jan 2021) Part III–Reading posts
  9. (26 Jan 2021) Part II–Accepting posts
  10. (25 Jan 2021) Part I–Laying the numbers