reWriting a very fast cache service with millions of entries

time to read 3 min | 415 words

imageI run into this article that talks about building a cache service in Go to handle millions of entries. Go ahead and read the article, there is also an associated project on GitHub.

I don’t get it. Rather, I don’t get the need here.

The authors seem to want to have a way to store a lot of data (for a given value of lots) that is accessible over REST.  The need to be able to run 5,000 – 10,000 requests per second over this. And also be able to expire things.

I decided to take a look into what it would take to run this in RavenDB. It is pretty late here, so I was lazy. I run the following command against our live-test instance:

image

This say to create 1,024 connections and get the same document. On the right you can see the live-test machine stats while this was running. It peaked at about 80% CPU. I should note that the live-test instance is pretty much the cheapest one that we could get away with, and it is far from me.

Ping time from my laptop to the live-test is around 230 – 250 ms. Right around the numbers that wrk is reporting. I’m using 1,024 connections here to compensate for the distance. What happens when I’m running this locally, without the huge distance?

image

So I can do more than 22,000 requests per second (on a 2016 era laptop, mind) with max latency of 5.5 ms (which the original article called for average time). Granted, I’m simplifying things here, because I’m checking a single document and not including writes. But 5,000 – 10,000 requests per second are small numbers for RavenDB. Very easily achievable.

RavenDB even has the @expires feature, which allows you to specify a time a document will automatically be removed.

The nice thing about using RavenDB for this sort of feature is that millions of objects and gigabytes of data are not something that are of particular concern for us. Raise that by an orders of magnitude, and that is our standard benchmark. You’ll need to raise it by a few more orders of magnitudes before we start taking things seriously.

More posts in "re" series:

  1. (02 Jun 2022) BonsaiDb performance update
  2. (14 Jan 2022) Are You Sure You Want to Use MMAP in Your Database Management System?
  3. (09 Dec 2021) Why IndexedDB is slow and what to use instead
  4. (23 Jun 2021) The performance regression odyssey
  5. (27 Oct 2020) Investigating query performance issue in RavenDB
  6. (27 Dec 2019) Writing a very fast cache service with millions of entries
  7. (26 Dec 2019) Why databases use ordered indexes but programming uses hash tables
  8. (12 Nov 2019) Document-Level Optimistic Concurrency in MongoDB
  9. (25 Oct 2019) RavenDB. Two years of pain and joy
  10. (19 Aug 2019) The Order of the JSON, AKA–irresponsible assumptions and blind spots
  11. (10 Oct 2017) Entity Framework Core performance tuning–Part III
  12. (09 Oct 2017) Different I/O Access Methods for Linux
  13. (06 Oct 2017) Entity Framework Core performance tuning–Part II
  14. (04 Oct 2017) Entity Framework Core performance tuning–part I
  15. (26 Apr 2017) Writing a Time Series Database from Scratch
  16. (28 Jul 2016) Why Uber Engineering Switched from Postgres to MySQL
  17. (15 Jun 2016) Why you can't be a good .NET developer
  18. (12 Nov 2013) Why You Should Never Use MongoDB
  19. (21 Aug 2013) How memory mapped files, filesystems and cloud storage works
  20. (15 Apr 2012) Kiip’s MongoDB’s experience
  21. (18 Oct 2010) Diverse.NET
  22. (10 Apr 2010) NoSQL, meh
  23. (30 Sep 2009) Are you smart enough to do without TDD
  24. (17 Aug 2008) MVC Storefront Part 19
  25. (24 Mar 2008) How to create fully encapsulated Domain Models
  26. (21 Feb 2008) Versioning Issues With Abstract Base Classes and Interfaces
  27. (18 Aug 2007) Saving to Blob
  28. (27 Jul 2007) SSIS - 15 Faults Rebuttal
  29. (29 May 2007) The OR/M Smackdown
  30. (06 Mar 2007) IoC and Average Programmers
  31. (19 Sep 2005) DLinq Mapping