High performance .NETBuilding a Redis Clone–naively

time to read 5 min | 849 words

I run into this project, which aims to be a Redis clone with better performance and ease of use. I found it interesting because one of the main selling points there was that it is able to run in a multi threaded mode (instead of Redis’ single thread per process model). They use memtier_benchmark (part of Redis) to test their performance. I got curious about how much performance I could get out of the system if I built my own Redis clone in C#.

The first version I built was done pretty naively. The idea is to write it in a high level manner, and see where that puts us. To make things interesting, here are the test scenarios:

  • The memtier_benchmark is going to run on c6g.2xlarge instance, using 8 cores and 32 GB of memory.
  • The tested instance is going to run on c6g.4xlarge, using 16 cores and 64 GB of memory.

Both of those instances are running on the same availability zone.

The command I’m going to run is:

memtier_benchmark –s $SERVER_IP -t 8 -c 16 --test-time=30 --distinct-client-seed -d 256 --pipeline=30

What this says is that we’ll use 8 threads (number of cores on the client instance) with 32 connections per thread, we’ll use 20% writes & 80% reads with data size that is 256 bytes in size. In total, we’ll have 256 clients and out tests are going to continuously push more data into the system.

The server is being run using:

dotnet run –c Release

Here is an example of the server while under this test:


I chose 30 seconds for the test duration to balance doing enough work to feel what is going on (multiple GC cycles, etc) while keeping the test duration short enough that I won’t get bored.

Here are the naïve version results:

Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
Sets        86300.19          ---          ---         8.14044         0.92700        99.83900       196.60700     25610.97
Gets       862870.15     36255.57    826614.58         8.10119         0.91900        99.32700       196.60700     42782.42
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     949170.34     36255.57    826614.58         8.10476         0.91900        99.32700       196.60700     68393.39

So the naïve version, using C#, doing almost nothing, is almost touching the 1 million queries / sec. The latency, on the other hand, isn’t that good. With the p99 at almost 100ms.

Now that I got your attention with the numbers and pretty graphs, let me show you the actual code that I'm running. This is a “Redis Clone” in under 100 lines of code.

Just a few notes on the implementation. I’m not actually doing much. Most of the code is there to parse the Redis protocol. And the code is full of allocations. Each command parsing is done using multiple string splits and concats. Replies to the client require even more concats. The “store” for the system is actually just a simple ConcurrentDictionary, without anything to avoid contention or high costs.

The manner in which we handle I/O is pretty horrible, and… I think you get where I’m going here, right? My goal is to see how I can use this (pretty simple) example to get more performance without having to deal with a lot of extra fluff.

Given my initial attempt is already at nearly 1M QPS, that is a pretty good start, even if I say so myself.

The next step that I want to take it to handle the allocations that are going on here. We can probably do better here, and I aim to try. But I’ll do that in the next post.

More posts in "High performance .NET" series:

  1. (27 Jun 2022) Building a Redis Clone – skipping strings
  2. (20 Jun 2022) Building a Redis Clone– the wrong optimization path
  3. (13 Jun 2022) Building a Redis Clone–separation of computation & I/O
  4. (10 Jun 2022) Building a Redis Clone–Architecture
  5. (09 Jun 2022) Building a Redis Clone–Analysis
  6. (08 Jun 2022) Building a Redis Clone–naively