Making code fasterGoing down the I/O chute
But we can do better still. So far, we relied heavily on the File.ReadLines method, which handle quite a lot of the parsing complexity for us. However, that would still allocate a string per line, and our parsing relied on us splitting the strings again, meaning more allocations.
We can take advantage of our knowledge of the file to do better. The code size blows up, but it is mostly very simple. We create a dedicated record reader class, which will read each line from the file, with a minimum of allocations.
There is a non trivial amount of stuff going on here. We start by noting that the size in character of the data is fixed, so we can compute the size of a record very easily. Each record is exactly 50 bytes long.
The key parts here is that we are allocating a single buffer variable, which will hold the line characters. Then we just wrote our own date and integer parsing routines that are very trivial, specific to our case and most importantly, don’t require us to allocate additional strings.
Using this code is done with:
So we are back to single threaded mode. Running this code gives us a runtime of 1.7 seconds, 126 MB allocated and a peak working set of 35 MB.
We are now about 2.5 times faster than previous parallel version, and over 17 times faster than the original version.
Making this code parallel is fairly trivial now, divide the file into sections and have a record reader on each section, but is there really much point at this stage?