Reducing allocations and resource usages when using Task.Delay
In my previous posts, I talked about tri state waiting, which included the following line:
And then I did a deep dive into how timers on the CLR are implemented. Taken together, this presents me somewhat of a problem. What is the cost of calling a Task.Delay? Luckily, the code is there, so we can check.
The relevant costs are here. We allocate a new DelayPromise, and a Timer instance. As we previously saw, actually creating a Timer instance will take a global lock, and the cost of actually firing those timers is proportional to the number of timers that we have.
Let’s consider the scenario for the code above. We want to support a very large number of clients, and we typically expect them to be idle, so every second we’ll have the delay fire, we send them a heartbeat, and go on with the waiting. What will happen in such a case for the code above?
All of those async connections are going to be allocating several objects per run, and each of them is going to contend on the same lock. All of them are also going to run a lot more slowly than they should.
In order to resolve this issue, given what we know now about the timer system on the CLR, we can write the following code:
In this case, we have a single static timer (so there is no lock contention per call), and we have a single task allocation per second. All of the connections will use the same instance. In other words, if there are 10,000 connections, we just saved 20K allocations per second, as well as 10K lock contentions, not to mention that instead of waking up every single connection one at a time, we have just a single task instance is completed once, resulting in all the the timing out tasks being woken up.
This code does have the disadvantage of allocating a task every second, even if no one is listening. It is a pretty small price to pay, and fixing it can be left as an exercise for the reader .
Comments
Line 16,17 shouldn't it be: var old = Interlocked.Exchange(ref _nextTimeout, next); ?
How do you make sure that it will wait at least 1 second?
The global timer is firing every second. Because of this a call for delay could last much less than a second. Or isn't that a problem?
// Ryan
I ran into this before and looked into the implementation for Timers. Even if we implement a better managed Timers to support 500, 1000 or more concurrent timers, TPL does not make it really easy to replace it's implementation of Task.Delay(). Not sure why they haven't considered replacing the linked list implementation with a heap, given that folks are running into it much sooner when using TPL.
Daniel, Yes, that would be better, I'm assuming here that I'm the only one running this, and using
Interlocked
to make sure that callers to OneSecond will see it immediatelyDaniel, I'm not. I don't actually care about it happening in 1 seconds, I'm assuming that this is called in a loop, so it will be raised roughly every one second. Note that it is fairly trivial to implement at least one second by having two of them, and register to the next one while invoking the current one. Still not very accurate, but pretty good
Ryan, Yes, you might get a lot less than 1 second the first time you call, but the idea is that this is called in a loop, while waiting for other stuff to be done. So the next time, it will be fine
I get the concept (nice idea!), but the example seems to be lacking some detail - or I am lacking understanding. :) The
_nextTimeout
variable is never initialized before accessingOneSecond
, because the get accessor takes place directly after the static initializer is run, so that returning_nextTimeout.Task
should throw a NRE (TimerCallback
has not been called, yet.). And even if it did, the first run ofTimerCallback
should throw NRE, too, becauseold
will be null ..?Hangy, You are correct, we need to setup a default value there, fixed.
As all the clients are synchronised on the same timer, what are your thoughts on the bursty nature of this?
Andy, The clients don't care about the time, only that it is roughly every sec. There isn't that much burstiness. It goes to the thread pool, and it will be executed there based on how many threads are available.
indeed. I was thinking of the server. All the heartbeats will be triggered/sent at the same time. Admittedly they are small. But there will be a burst of cpu and network each second (approx). Not a worry for small numbers but if there are 1000's of clients connected?
Andy, Think about what will happen, once a second we'll throw a lot of work to the thread pool, which will handle each operation a few at a time. Les us say that we have 10K clients, and 25 threads. We'll process 25 at a time, and because of how TPL work, we'll run the code to set the task, then we'll run the code for the waiting client, which will just send an async heartbeat and move on to the next client. This kind of staggering will alleviate the burst
yeah, understood. Relying on the implicit behavior of the framework/runtime. I do like how simple this is. Thought it was worth having stated for the record (esp for those that mess with the concurrency bits). Understanding that the TaskScheduler/threadpool may be "swamped" with these occasionally may be of interest in some scenarios.
Comment preview