Building a social media platform without going bankrupt: Part VIII–Tagging and searching

Feb 03 2021

Building a social media platform without going bankruptPart VIII–Tagging and searching

time to read 3 min | 555 words

Quite a few of the features that we consider native to social media actually came about as a result of users’ behavior, not pre-planned actions. For example, the #tagging and @mentions were both created by users and then adopted as an official action by the social media giants.

I already touched on how mentioned are handled, as part of writing the document itself, we insert the post id to the timeline of the mentioned user. For tagging, the process is very similar. Each tag has a timeline, and we can insert posts into the tag’s timeline. From there on, the process is basically identical for what we already describe.

I want to stop for a second and emphasis the coolness factor of a significant feature being handle (in the backend) via simply:

	async function handle_tags_async(post){
	for(const tag in post.tags){
	var timeline = await timelines.for_tag(tag);
	await timelines.append(timeline, post.id);
	}
	}

view raw tags.js hosted with ❤ by GitHub

There is probably UI work to do here, but that is roughly all you’ll need to manage tags. Presumably you’ll want some better policies, but that is the core behavior you’ll need to support this sort of feature.

What about searching? Full search indexes are nothing new and you can get an off the shelve solution to manage your searches easily enough. That is likely to be one of the annoying pieces of the system. Luckily, we can usually handle things by offering two tiers of searching. We have the first tier, which cover posts in the recent past (a month or two) which must have very high speed queries and then we have full data search, for which we have a lot longer SLA. By far most queries are going to be hitting the recent data set, which makes the task itself easier. The actual choice of indexing solution and its usage is fairly irrelevant at this point. You’ll need something that is distributed, but there is enough variety there that you can get away with selecting pretty much anything.

We aren’t going to need to provide sophisticated full text search features, we just want users to be able to find results by text queries.

You’ll note that throughout this series of posts, I’m not trying to find novel ways to get the best solution. I’m using practical options for the actual use case presented and in many cases, I can get away with a lot by changing the requirements just slightly. For that matter, a lot of the limitations that I accept are real limitations that you’ll find with other social media networks as well.

Finally, I just wanted to show how we can enable basic search capability using minimal amount of code, given the infrastructure we have so far:

	var stopwords = ["a", "the", "has", "have"];
	async function handle_search(post){
	var terms = post.text.split(' ');
	for (const stopword of stopwords) {
	terms.remove(stopword);
	}
	for (const term of terms) {
	let term_timeline = await timelines.search_for(term);
	await timelines.append(term_timeline, post.id);
	}
	}

view raw search.js hosted with ❤ by GitHub

As you can see, we built a simple full text search here. To query it, you get the timeline for a particular term and get the list of post that it has.

For tags and searches, as you can imagine, this can be a huge list, which is partly why timelines are built on the concept of sections that can be so easily distributed.

The solution above isn’t actually a good one for full text search. I can’t easily turn that into a search by a phrase, and there are many other features that I’ll likely want to have, but that is a good example of how the infrastructure that we built for one part of the system can be utilized for completely different purpose.

Tweet Share Share 0 comments

Tags:

Oren Eini

Oren Eini

CEO of RavenDB