High performance .NETBuilding a Redis Clone–naively
I run into this project, which aims to be a Redis clone with better performance and ease of use. I found it interesting because one of the main selling points there was that it is able to run in a multi threaded mode (instead of Redis’ single thread per process model). They use memtier_benchmark (part of Redis) to test their performance. I got curious about how much performance I could get out of the system if I built my own Redis clone in C#.
The first version I built was done pretty naively. The idea is to write it in a high level manner, and see where that puts us. To make things interesting, here are the test scenarios:
- The memtier_benchmark is going to run on c6g.2xlarge instance, using 8 cores and 32 GB of memory.
- The tested instance is going to run on c6g.4xlarge, using 16 cores and 64 GB of memory.
Both of those instances are running on the same availability zone.
The command I’m going to run is:
memtier_benchmark –s $SERVER_IP -t 8 -c 16 --test-time=30 --distinct-client-seed -d 256 --pipeline=30
What this says is that we’ll use 8 threads (number of cores on the client instance) with 32 connections per thread, we’ll use 20% writes & 80% reads with data size that is 256 bytes in size. In total, we’ll have 256 clients and out tests are going to continuously push more data into the system.
The server is being run using:
dotnet run –c Release
Here is an example of the server while under this test:
I chose 30 seconds for the test duration to balance doing enough work to feel what is going on (multiple GC cycles, etc) while keeping the test duration short enough that I won’t get bored.
Here are the naïve version results:
============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 86300.19 --- --- 8.14044 0.92700 99.83900 196.60700 25610.97 Gets 862870.15 36255.57 826614.58 8.10119 0.91900 99.32700 196.60700 42782.42 Waits 0.00 --- --- --- --- --- --- --- Totals 949170.34 36255.57 826614.58 8.10476 0.91900 99.32700 196.60700 68393.39
So the naïve version, using C#, doing almost nothing, is almost touching the 1 million queries / sec. The latency, on the other hand, isn’t that good. With the p99 at almost 100ms.
Now that I got your attention with the numbers and pretty graphs, let me show you the actual code that I'm running. This is a “Redis Clone” in under 100 lines of code.
Just a few notes on the implementation. I’m not actually doing much. Most of the code is there to parse the Redis protocol. And the code is full of allocations. Each command parsing is done using multiple string splits and concats. Replies to the client require even more concats. The “store” for the system is actually just a simple ConcurrentDictionary, without anything to avoid contention or high costs.
The manner in which we handle I/O is pretty horrible, and… I think you get where I’m going here, right? My goal is to see how I can use this (pretty simple) example to get more performance without having to deal with a lot of extra fluff.
Given my initial attempt is already at nearly 1M QPS, that is a pretty good start, even if I say so myself.
The next step that I want to take it to handle the allocations that are going on here. We can probably do better here, and I aim to try. But I’ll do that in the next post.
More posts in "High performance .NET" series:
- (19 Jul 2022) Building a Redis Clone–Analysis II
- (27 Jun 2022) Building a Redis Clone – skipping strings
- (20 Jun 2022) Building a Redis Clone– the wrong optimization path
- (13 Jun 2022) Building a Redis Clone–separation of computation & I/O
- (10 Jun 2022) Building a Redis Clone–Architecture
- (09 Jun 2022) Building a Redis Clone–Analysis
- (08 Jun 2022) Building a Redis Clone–naively
Comments
Very interesting! But how does it compare to Dragonfly and Redis - can you please run this benchmark on the same instance type as in Dragonfly README (c6gn.16xlarge)?
Pavel,
Give me some time to go through everything. Right now, we are significantly shorter, no point in matching the same server type. I'll look into that when I'm actually done here.
your post gives me room for speculation: ravendb will soon support the redis protocol and can thus replace a redis server.
i.e. a component in my system that needs redis, i can run via ravendb.
this would simplify my ops life massively.
Tobias,
That is... not something that we actually planned to. A key issue with this is the different security behaviors, and the 200+ commands that Redis is implemented.
I'm curious why you don't use
var client = await listener.AcceptTcpClientAsync();
? Not 100% sure but I'm seeing a slightly better performance with this instead of non-async accept.Simon,
Connection acceptance isn't something that I'm worried about. There should be a relatively stable number of connections. To be honest, it never even occurred to me to do so and I'm surprised it has an impact on performance.
Hi
While reproducing the blog post, I've found two small inconsistencies.
In the blog post it says "8 threads, with 32 connections each", but the command line arguments "-t 8 -c 16" are for "8 threads with 16 connections". I assumed the text is correct and the benchmark ran with 32 connections.
Another small thing. the memory sizes of the AWS instances seems smaller: c6g.2xlarge has 16Gyte, c6g.4xlarge has 32Gbyte. Unless there are other flavors I've missed. Anyway, seems there is plenty of memory anyways.
Roman,
Locally I'm testing with:
Remotely (on AWS) with the command line in the post.
There is plenty of RAM for this scenario, yes.
Comment preview