Making code fasterSpecialization make it faster still

time to read 3 min | 403 words

Okay, at this point we are really pushing it, but I do wonder if we can get it faster still?

image

So we spend a lot of time in the ParseTime call, parsing two dates and then subtracting them. I wonder if we really need to do that?

I wrote two optimizations, once to compare only the time part if they are the same, and the second to do the date compare in seconds, instead of ticks. Here is what this looks like:

Note that we compare the first 12 bytes using just 2 instructions (by comparing long & int values), since we don’t care what they are, only that they are equal. The result:

283 ms and allocated 1,296 kb with peak working set of 295,200 kb

So we are now 135 times faster than the original version.

Here is the profiler output:

image

And at this point, I think that we are pretty much completely done. We can parse a line in under 75 nanoseconds, and we can process about 1 GB a second on this machine ( my year old plus laptop ).

We can see that the skipping the date compare for time compare if we can pay off in about 65% of the cases, so that is probably a nice boost right there. But I really can’t think of anything else that we can do here that can improve matters in any meaningful way.

For comparison purposes.

Original version:

  • 38,478 ms
  • 7,612,741 kb allocated
  • 874,660 kb peak working set
  • 50 lines of code
  • Extremely readable
  • Easy to change

Final version:

  • 283 ms
  • 1,296 kb allocated
  • 295,200 kb peak working set
  • 180 lines of code
  • Highly specific and require specialize knowledge
  • Hard to change

So yes, that is 135 times faster, but the first version took about 10 minutes to write, then another half an hour to fiddle with it to make it non obviously inefficient. The final version took several days of careful though, analysis of the data and careful optimizations.

More posts in "Making code faster" series:

  1. (24 Nov 2016) Micro optimizations and parallel work
  2. (23 Nov 2016) Specialization make it faster still
  3. (22 Nov 2016) That pesky dictionary
  4. (21 Nov 2016) Streamlining the output
  5. (18 Nov 2016) Pulling out the profiler
  6. (17 Nov 2016) I like my performance unsafely
  7. (16 Nov 2016) Going down the I/O chute
  8. (15 Nov 2016) Starting from scratch
  9. (14 Nov 2016) The obvious costs
  10. (11 Nov 2016) The interview question