Mixing Sync & Async calls
Take a look at the following code:
int count = 0; var sp = Stopwatch.StartNew(); var tasks = new List<Task>(); for (int i = 0; i < 500; i++) { var t = Task.Run(() => { var webRequest = WebRequest.Create(new Uri(“http://google.com”));
webRequest.GetResponse().Close(); Interlocked.Increment(ref count); }); tasks.Add(t); } var whenAll = Task.WhenAll(tasks.ToArray()); while (whenAll.IsCompleted == false && whenAll.IsFaulted == false) { Thread.Sleep(1000); Console.WriteLine("{0} - {1}, {2}", sp.Elapsed, count, tasks.Count(x=> x.IsCompleted == false)); } Console.WriteLine(sp.Elapsed);
As you can see, it is a pretty silly example of making 500 queries to Google. There is some stuff here about reporting & managing, but the key points is that we start 500 tasks to do some I/O. How long do you think that this is going to run?
On my machine, this code completes in roughly 22 seconds. Now, WebRequest is old school, and we want to use HttpClient, because it has a much better interface, so we wrote:
int count = 0; var sp = Stopwatch.StartNew(); var tasks = new List<Task>(); for (int i = 0; i < 500; i++) { var t = Task.Run(() => { var client = new HttpClient(); client.SendAsync(new HttpRequestMessage(HttpMethod.Get, new Uri("http://google.com"))).Wait(); Interlocked.Increment(ref count); }); tasks.Add(t); } var whenAll = Task.WhenAll(tasks.ToArray()); while (whenAll.IsCompleted == false && whenAll.IsFaulted == false) { Thread.Sleep(1000); Console.WriteLine("{0} - {1}, {2}", sp.Elapsed, count, tasks.Count(x => x.IsCompleted == false)); } Console.WriteLine(sp.Elapsed);
I’ve been running this code on my machine for the past 7 minutes, and this code hasn’t yet issue a single request.
It took a while to figure it out, but the problem is in how this is structured. We are creating a lot of tasks, more than the available threads in the thread pool. Then we make an async call, and block on that. That means that we block the thread pool thread. Which means that to process the result of this call, we’ll need to use another thread pool thread. However, we have scheduled more tasks than we have threads for. So there is no thread available to handle the reply, so all the threads are stuck waiting for reply that there is no thread to handle to unstick them.
The thread pool notices that, and will decide to allocate more threads, but they are also taken up by the already existing tasks that will immediately block.
Now, surprisingly, eventually the thread pool will allocate enough threads (although it will take it several hours to do so, probably) to start handling the requests, and the issue is resolved. Expect that this ends up basically crippling the application while this is happening.
Obviously, the solution is to not wait on async calls inside a task like that, indeed, we can use the following code quite easily:
var t = Task.Run(async () => { var client = new HttpClient(); await client.SendAsync(new HttpRequestMessage(HttpMethod.Get, new Uri("http://google.com"))); Interlocked.Increment(ref count); });
And this shows performance comparable to the WebRequest method.
However, that isn’t very helpful when we need to expose a synchronous API. In our case, in RavenDB we have the IDoctumentSession, which is a simple synchronous API. We want to use the a common codebase for sync and async operations. But we need to expose the same interface, even if we changed things. To make things worse, we have no control on how the http client is scheduling the async operations, and it has no sync operations.
That means that we are left with the choice of writing everything twice, once for synchronous operations and once for async ops or just living with this issue.
A work around we managed to find is to run the root tasks (which we do control) on our own task scheduler, so they won’t compete with the I/O operations. But that is hardly something that I am thrilled about.
Comments
Actually, in the last example you don't need Task.Run at all; it just creates unnecessary new threads. You can do that instead: https://gist.github.com/thomaslevesque/01cbe75656b76fe2c744
On my machine it runs about twice as fast as the WebRequest method.
Synchronous and asynchronous code mix pretty poorly. The general advice seems to be to expose your asynchronous calls all the way up - essentially don't expose a synchronous API if to do so would require you to internally wait on an async call, the reason being highlighted by the scenario you have walked through: as a library designer you don't have control over the threading context your code will be called in.
This is particularly sensitive when your code is running inside an event-pump architecture, where essentially you cannot run two events on the message pump at the same time, and any blocking call will deadlock the application.
You might get some mileage out Task.WhenAll as a way of consolidating several parallel activities into a single awaitable task. Also you might need to use Task.ConfigureAwait.
Using unnecessary
Task.Run
meansfake async
. http://blog.stephencleary.com/2013/11/taskrun-etiquette-examples-dont-use.htmlThe new async mentality is really harmful to .NET development at large. It seems like everyone is doing it without even considering the alternative. Some libraries are now forcing async on their callers.
RavenDB is a good use case for async but HttpClient is a really bad case for async-only. What if I want to make a single HTTP request at a time? No need for async at all.
Most fads pass but this one creates lasting damage. Sad.
What you should be doing instead is calling client.SendAsync() without doing any kind of awaiting or Task.Run() within the FOR loop.
Asynchronous I/O methods use I/O Completion Ports within Windows. So your app will spawn off 500 of these IOCPs (they're not real threads, so 500 is fine). Afterwards you can do an await Task.WhenAll() to wait for all 500 items to finish.
Your current way will spawn up 500 real threads and your computer/server will soon grind to a halt. IOCPs are there to prevent thread usage but still allow for asychronous I/O communication with minimal context switching. IOCP is what provides for scalable servers in Windows/.NET programming.
There's not a clean way to do a "sync" only version of it this way, other than to rewrite it without client.SendAsync and use client.Send() instead. Or you could provide a blocking "await" on top of the call to this method. However, in general for async methods, it should provide scalability benefits when doing I/O operations.
@JDice You're advertising async as if it's something new to .Net. But it isn't, asynchronous operations were there since v 1.0 and async I/O was always possible.
This is definitely a miss use of async operations. If this is what you're doing in your codebase i'd suggest a refactoring, since spinning up a ThreadPool thread to do async I/O is a pure waste of resources as everyone mentioned above. There isn't a need for a custom TaskScheduler in this case at all
@tobi: An API should be async if and only if it is naturally asynchronous. HTTP requests are asynchronous, and should have an async API, even if only one request is done at a time.
The real difficult part to API design is when the operation is abstract enough that it could be synchronous or asynchronous, depending on the implementation. In this case, you really have to guess. This is a design issue that is remarkably similar to IDisposable: you must know whether any implementation possibly may be asynchronous/disposable in order to design the API appropriately.
The thing with async is that it has to be infections. Such that to call it you need to do so from async code. In the end it's just better to never block on sync code.
@StephenCleary Async APIs have a set of pros and cons. It is not at all clear that the right app architecture is always async for inherently async operations. Sometimes it clearly is, sometimes I believe it clearly isn't.
The simplicity of sync is appealing and async (on the server) only helps in certain cases. If my app never has to deal with more than 100 concurrent requests (which already is a top 1000 website) I can just rely on the thread-pool and have a simple website in fully sync mode. Async would give me nothing relevant.
I might selectively async a few actions that are especially prone to waiting. Those would be good cases to async them.
I call for differentiation. I refuse the notion that async is a good default model.
"The new async mentality is really harmful to .NET development at large. It seems like everyone is doing it without even considering the alternative. Some libraries are now forcing async on their callers."
No, it is not. Just how developers had to learn proper OOP (and stumbled badly at first) they WILL have to learn async programming in the new era. There's no way to go around it, sequencial programming works less and less in the era of multicore, multi machines clouds and high latencies.
So let's all just do it, learn from the mistakes at first and in the end master it.
JDice, I'm well aware of that. The problem is when I want to do a sync stuff, for example, I have an existing API that is using sync operations. I can either write my network code twice, or I have to run it with this.
Yuval, That isn't how the HttpClient works, however. The code here is to show an example of problematic usage.
Case in point is when you have a task (for example, ASP.Net Web API, that is using sync operations (for example, because you are calling existing part of your codebase that wasn't written for async) and you have to wait on that.
BTW, has anyone done any comparison what is the cost of a thread switch at the OS level versus the cost of a 'soft' thread switch in .Net async construct?
In regards to your last example there is no point to create a new task if you already have a task returned from SendAsync()
var t = client.SendAsync(new HttpRequestMessage(HttpMethod.Get, new Uri("http://google.com"))); t.ContinueWith(x => Interlocked.Increment(ref count)); tasks.Add(t);
Boris, Assume that I actually have to do additional work there. As I mentioned, this is a very simplified code.
Comment preview