High performance .NET: Building a Redis Clone–naively

Jun 08 2022

High performance .NETBuilding a Redis Clone–naively

time to read 5 min | 849 words

I run into this project, which aims to be a Redis clone with better performance and ease of use. I found it interesting because one of the main selling points there was that it is able to run in a multi threaded mode (instead of Redis’ single thread per process model). They use memtier_benchmark (part of Redis) to test their performance. I got curious about how much performance I could get out of the system if I built my own Redis clone in C#.

The first version I built was done pretty naively. The idea is to write it in a high level manner, and see where that puts us. To make things interesting, here are the test scenarios:

The memtier_benchmark is going to run on c6g.2xlarge instance, using 8 cores and 32 GB of memory.
The tested instance is going to run on c6g.4xlarge, using 16 cores and 64 GB of memory.

Both of those instances are running on the same availability zone.

The command I’m going to run is:

memtier_benchmark –s $SERVER_IP -t 8 -c 16 --test-time=30 --distinct-client-seed -d 256 --pipeline=30

What this says is that we’ll use 8 threads (number of cores on the client instance) with 32 connections per thread, we’ll use 20% writes & 80% reads with data size that is 256 bytes in size. In total, we’ll have 256 clients and out tests are going to continuously push more data into the system.

The server is being run using:

dotnet run –c Release

Here is an example of the server while under this test:

I chose 30 seconds for the test duration to balance doing enough work to feel what is going on (multiple GC cycles, etc) while keeping the test duration short enough that I won’t get bored.

Here are the naïve version results:

============================================================================================================================
Type         Ops/sec     Hits/sec   Misses/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec
----------------------------------------------------------------------------------------------------------------------------
Sets        86300.19          ---          ---         8.14044         0.92700        99.83900       196.60700     25610.97
Gets       862870.15     36255.57    826614.58         8.10119         0.91900        99.32700       196.60700     42782.42
Waits           0.00          ---          ---             ---             ---             ---             ---          ---
Totals     949170.34     36255.57    826614.58         8.10476         0.91900        99.32700       196.60700     68393.39

So the naïve version, using C#, doing almost nothing, is almost touching the 1 million queries / sec. The latency, on the other hand, isn’t that good. With the p99 at almost 100ms.

Now that I got your attention with the numbers and pretty graphs, let me show you the actual code that I'm running. This is a “Redis Clone” in under 100 lines of code.

Just a few notes on the implementation. I’m not actually doing much. Most of the code is there to parse the Redis protocol. And the code is full of allocations. Each command parsing is done using multiple string splits and concats. Replies to the client require even more concats. The “store” for the system is actually just a simple ConcurrentDictionary, without anything to avoid contention or high costs.

The manner in which we handle I/O is pretty horrible, and… I think you get where I’m going here, right? My goal is to see how I can use this (pretty simple) example to get more performance without having to deal with a lot of extra fluff.

Given my initial attempt is already at nearly 1M QPS, that is a pretty good start, even if I say so myself.

The next step that I want to take it to handle the allocations that are going on here. We can probably do better here, and I aim to try. But I’ll do that in the next post.

Tweet Share Share 8 comments

Tags:

More posts in "High performance .NET" series:

(19 Jul 2022) Building a Redis Clone–Analysis II
(27 Jun 2022) Building a Redis Clone – skipping strings
(20 Jun 2022) Building a Redis Clone– the wrong optimization path
(13 Jun 2022) Building a Redis Clone–separation of computation & I/O
(10 Jun 2022) Building a Redis Clone–Architecture
(09 Jun 2022) Building a Redis Clone–Analysis
(08 Jun 2022) Building a Redis Clone–naively

Comments

08 Jun 2022
15:25 PM

Pavel

Very interesting! But how does it compare to Dragonfly and Redis - can you please run this benchmark on the same instance type as in Dragonfly README (c6gn.16xlarge)?

08 Jun 2022
16:55 PM

Oren Eini

Pavel,

Give me some time to go through everything. Right now, we are significantly shorter, no point in matching the same server type. I'll look into that when I'm actually done here.

09 Jun 2022
09:47 AM

Tobias Zürcher

your post gives me room for speculation: ravendb will soon support the redis protocol and can thus replace a redis server.

i.e. a component in my system that needs redis, i can run via ravendb.

this would simplify my ops life massively.

09 Jun 2022
09:49 AM

Oren Eini

Tobias,

That is... not something that we actually planned to. A key issue with this is the different security behaviors, and the 200+ commands that Redis is implemented.

12 Jun 2022
07:29 AM

Simon

I'm curious why you don't use var client = await listener.AcceptTcpClientAsync(); ? Not 100% sure but I'm seeing a slightly better performance with this instead of non-async accept.

12 Jun 2022
07:37 AM

Oren Eini

Simon,

Connection acceptance isn't something that I'm worried about. There should be a relatively stable number of connections. To be honest, it never even occurred to me to do so and I'm surprised it has an impact on performance.

04 Jul 2022
20:38 PM

Roman Stoffel

While reproducing the blog post, I've found two small inconsistencies.

In the blog post it says "8 threads, with 32 connections each", but the command line arguments "-t 8 -c 16" are for "8 threads with 16 connections". I assumed the text is correct and the benchmark ran with 32 connections.

Another small thing. the memory sizes of the AWS instances seems smaller: c6g.2xlarge has 16Gyte, c6g.4xlarge has 32Gbyte. Unless there are other flavors I've missed. Anyway, seems there is plenty of memory anyways.

06 Jul 2022
16:27 PM

Oren Eini

Roman,

Locally I'm testing with:

docker run --rm redislabs/memtier_benchmark:latest --ratio 1:5 -t 8 -c 64 --test-time=60 --distinct-client-seed --data-size=256  -s host.docker.internal  --hide-histogram --pipeline=32

Remotely (on AWS) with the command line in the post.

There is plenty of RAM for this scenario, yes.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB