Fun async tricks for getting better performance

time to read 8 min | 1552 words

I got into a discussion with Mark about the usefulness of async. In particular, Mark said:

Sync will always have less CPU usage because of less kernel transitions, less allocation, less synchronization.

This effect only comes into play when after issuing the IO the thread pool can immediately process a queued task. This is only the case when the CPU is highly saturated. At 90% saturation you have a good chance that this is happening. But nobody runs their production systems that way.

And while this is sort of correct, in the sense that a major benefit of async is that you free the working thread for someone else to work on, and that this is typically mostly useful under very high load, async is most certainly not useful just for high throughput situations.

The fun part about having async I/O is the notion of interleaving both I/O and computation together. Mark assumes that this is only relevant if you have high rate of work, because if you are starting async I/O, you have to wait for it to complete before you can do something interesting, and if there isn't any other task waiting, you are effectively blocked.

But that doesn't have to be the case. Let us take the following simple code. It isn't really doing something amazing, it is just filtering a text file:

public void FilterBadWords(string intputFile, string outputFile)
{
    var badWords = new[] { "opps", "blah", "dang" };

    using (var reader = File.OpenText(intputFile))
    using (var writer = File.AppendText(outputFile))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            bool hasBadWord = false;
            foreach (var word in badWords)
            {
                if (line.IndexOf(word, StringComparison.OrdinalIgnoreCase) != -1)
                {
                    hasBadWord = true;
                    break;
                }
            }

            if(hasBadWord == false)
                writer.WriteLine(line);
        }
    }
}

Here is the async version of the same code:

public async Task FilterBadWords(string intputFile, string outputFile)
{
    var badWords = new[] { "opps", "blah", "dang" };

    using (var reader = File.OpenText(intputFile))
    using (var writer = File.AppendText(outputFile))
    {
        string line;
        while ((line = await reader.ReadLineAsync()) != null)
        {
            bool hasBadWord = false;
            foreach (var word in badWords)
            {
                if (line.IndexOf(word, StringComparison.OrdinalIgnoreCase) != -1)
                {
                    hasBadWord = true;
                    break;
                }
            }

            if(hasBadWord == false)
                await writer.WriteLineAsync(line);
        }
    }
}

If we'll assume that we are running on a slow I/O system (maybe large remote file), in both version of the code, we'll see execution pattern like so:

image

In the sync case, the I/O is done in a blocking fashion, in the async case, we aren't holding up a thread ,but the async version need to do a more complex setup, so it is likely to be somewhat slower.

But the key is, we don't have to write the async version in this manner. Consider the following code:

 public async Task FilterBadWords(string intputFile, string outputFile)
 {
     var badWords = new[] { "opps", "blah", "dang" };

     using (var reader = File.OpenText(intputFile))
     using (var writer = File.AppendText(outputFile))
     {
         var lineTask = reader.ReadLineAsync();
         Task writeTask = Task.CompletedTask;
         while (true)
         {
             var currentLine = await lineTask;
             await writeTask;
             if (currentLine == null)
                 break;
             lineTask = reader.ReadLineAsync();

             bool hasBadWord = false;
             foreach (var word in badWords)
             {
                 if (currentLine.IndexOf(word, StringComparison.OrdinalIgnoreCase) != -1)
                 {
                     hasBadWord = true;
                     break;
                 }
             }

             if(hasBadWord == false)
                 writeTask = writer.WriteLineAsync(currentLine);
         }
     }
 }

The execution pattern of this code is going to be:

image

The key point is that we start async I/O, but we aren't going to await of it immediately. Instead, we are going to do some other work first (processing the current line while we fetch & write the next line).

In other words, when we schedule the next bit of I/O to be done, we aren't going to ask the system to find us some other piece of work to execute, we are the next piece of work to execute.

Nitpicker corner: This code isn't actually likely to have this usage pattern, this code is meant to illustrate a point.