The async/await model and its perils
This blog post about the color of a function is a really good explanation of the major issues with sync and async code in modern programming.
In C#, we have this lovely async/await model, which allows us to have the compiler handle all the hard work of yielding a thread while there is some sort of an expensive I/O bound operation going on. Having worked with that for quite a while, I can tell you that I really agree with the Bob’s frustrations on the whole concept.
But from my perspective, this come at several significant costs. The async machinery isn’t free, and in some cases (discussed below), the performance overhead of using async is actually significantly higher cost than using the standard blocking model. There is also the issue of the debugability of the solution, if you have a lot of async work going on, it is very hard to see what the state of the overall system is.
In practice, I think that we’ll fall down into the following rules:
For requests that are common, short and most of the work is either getting the data from the client / sending the data to the client, with a short (mostly CPU bound) work, we can use async operations, because they free a thread to do useful work (processing the next request) while we are spending most of our time in doing I/O with the remote machine.
For high performance stuff, where we have a single request doing quite a lot of stuff, or long living, we typically want to go the other way. We want to have a dedicated thread for this operation, and we want to do blocking I/O. The logic is that this operation isn’t going to be doing much while we are waiting for the I/O, so we might as well block the thread and just wait for it in place. We can rely on buffering to speed things up, but there is no point in giving up this thread for other work, because this is rare operation that is we want to be able to explicitly track all the way through.
In practice, with RavenDB, this means that a request such as processing a query is going to be handled mostly async, because we have a short compute bound operation (actually running the query), then we send the data to the client, which should take most of the time. In that time frame, we can give up the request processing thread to do another query. On the other hand, an operation like bulk insert shouldn’t want to give up its thread, because another request coming in and interrupting us means that we will slow down the bulk insert operation.
Comments
Are there any numbers out there about performance costs of async/await?
Dalibor, Highly context sensitive. For example, Kesterl, using async I/O can do over 1 million requests / second. But those are small operations, and mostly shelling to the I/O system. If you are working on a single request doing a lot of work, the situation is much different.
Strange but, intuitivelly it seems it should be the other way around, for short operations use a single thread with blocking IO, for longer operations use async with non blocking IO.
Thinking about this: - Async has a cost, which you don't want to pay unless it pays for itself (releases a thread for sufficient time to do significant work someplace else) - Threads are limited, therefore doing long operations while holding on to the current thread can reduce number processed requests singnificantly if there's a lot of IO involved.
Maybe I'm missing something ...
Most of the time threads are not a scarce resource. That makes saving them a non-goal. I feel this is not well understood in the community.
It's also interesting that SQL Server uses sync IO in it's normal query pipeline. It only uses async when doing scatter gather writes at high DOP (10-1000). Seems like a really good tradeoff. Normal queries are very unlikely to exhaust the thread pool because if that was the case they would have 10x overloaded the server anyway and it's not an interesting scenario.
Pop Catalin, Your logic is flawless. The issue is whatever you have other tasks that can currently be scheduled on the same thread while you are waiting for the I/O to complete. If you have those, then the cost of async is a plus, because you don't waste the thread resource.
However, if there is just one (or very few) such tasks, and you have more threads to spare, then using blocking I/O is probably better, since you don't pay for the async cost, and you don't benefit from the increased concurrency.
Mark, A lot of that probably has to do with the time it was written, good async I/O is relatively recent (as a practical usable thing). You could do async I/O on Windows forever, but it was a much more complex thing. There is also another issue here in that SQL Server in particular does a LOT with threads, fibers and their friends in the past to get the best thing it can actually do.
Finally, there is the communication model itself, where we have a conversation (request / reply over time) of sql commands and responses. That also has an affect on the programming model.
not every async method will run asynchronously, sometimes the JIT might run it sync
Uri, The JIT has nothing to do with that. If the method await on something that will use the async pattern, it will register to be invoked again when that is done, and then give up the thread.
You can have an async method that doesn't do that, for example, an async method that has a cache for certain values, but it still pays for the async machinery.
I've been reading a few sources along these lines recently — suggesting one may not wish to 'async all the things' due to the fact the computational overhead of doing so may outweigh the concurrency benefits.
https://github.com/aspnet/EntityFramework/issues/5816#issuecomment-228199098
Do you have any advice for developers when starting to program a modern ASP.NET web application — would you implement async action methods and database queries everywhere by default or leave that as an optimisation exercise later once an issue has been identified either through testing or production usage?
If you have a lot of requests that would be spending a lot of time in I/O, that is a good place for async, since this can reduce the amount of threads and CPU work that is done. However, if you are doing on trivial computation in async, that is likely to be detrimental.
The key issue is that you can't easily switch between the two because doing so is a very hard breaking change.
Thanks — I wouldn't advocate async for CPU-bound work but it is because you cannot easily switch between the two that I am asking really.
For instance, as a developer is it sensible, in a general sense, to say "I'm going to perform database, file system or network IO during this request, therefore I'll use async" or is it more sensible to assume not and add async where it makes a measurable difference?
I appreciate the right answer may well be "it depends" but given you have to choose one or the other with a lot of rework later it feels like a rule of thumb is appropriate.
Matt, My first instinct would be something like seeing how much time I'm going to be spending waiting for the I/O, and how many concurrent operations I'm going to have.
If this is a web server, and I'm reading a lot of requests, async makes a l lot of sense. If this is a single / low thread application that does heavy process, probably not make sense.
Thanks for the responses, Oren. They've been useful.
@Oren Have you looked at the actor system as implemented by Akka.net? If so, what are your thoughts?
I would also like your opinion on Reactive Extensions too. These allow control over the scheduler, are debuggable and alow the return of multiple values.
Chanan, That has pretty much zero impact on this. See: http://getakka.net/docs/ActorSystem#blocking-needs-careful-management
They have roughly the same things to deal with, I think. But are more sensitive to blocking calls, from what it seems there.
Antão, How are those debuggable? In particular, if I'm subscribing to notifications, I have no way of knowing where I am in the process.
This is also not a good idea if I want to have a conversation (consider using RX to do something like POP3, it would be a very awkward model).
I meant testeable: http://www.introtorx.com/Content/v1.0.10621.0/16_TestingRx.html#TestingRx
One of the biggest advantages of RX is that it's composable. RX is functional and I'm sure conversations are possible in functional languages.
I've never done POP3 but I have developed RX-based apps with multiple states and animated transitions. I always knew I was in the process.
Antão, I choose POP3 specifically, because it is a very simple request / reply protocol. You can't really do something like that with RX, at least nothing that I can envision.
Testing code and debugging it are very different things, and I don't see any debug advantages to RX in this case. If you are waiting for a timeout to complete in RX, you won't be seeing that in the debugger any better than waiting for an async operation to complete
I've already implemented a custom request/reply protocol on RX with no issues. I first implemented it using async/await and, once I learned RX, I converted everything to RX. I actually find it now to be more robust, easier to read and to maintain.
Antão, Can you share a simple POP3 over RX (or any req/reply protocol) so I can see that. I have a hard time seeing how this would work
I have to say I don't like it. I don't like how the async await programming paradigm demands that code changes be made to the entire call stack just to utilise asynchronous in one spot.
Mick, That is my major problem as well, as soon as you have async in one place, it pervade everything
C#/VB's async/await was never about performance of any single operation. In fact, if there's more code being executed, how could it ever be faster.
It's all about availability. Availability of client UI threads or availability of server processing threads.
Îf you're running out of threads (meaning the thread pool is always growing) and that is hurting your server or your clients, you should go async (whatever the pattern).
If you need raw performance, go sync, even if it blocks that thread.
Antão, RX is for handling streams of events. Stuff that happens regardless of what you do. Like Oren, I can't imagine it being more suitable to request+response than async/await.
Comment preview