Excerpts from the RavenDB Performance team reportDates take a lot of time
RavenDB uses a lot of dates, from the last modified metadata on a document to the timestamp of an index or when a query was started or… you get the point, lots and lots of dates.
Dates in RavenDB are usually formatted in the following manner:
2015-01-15T00:41:16.6616631
This is done using the following date time format:
yyyy'-'MM'-'dd'T'HH':'mm':'ss.fffffff
This is pretty awesome. It generate readable dates that are lexicographically sorted. There is just one problem with that, this is really expensive to do. How expensive? Well, outputting 10 million dates using the following manner:
dateTime.ToString(Default.DateTimeFormatsToWrite, CultureInfo.InvariantCulture)
This takes 13.3 seconds, or just about 750 dates per millisecond. The costs here are partly the allocations, but mostly it is about the fact that the format provider needs to first parse the format specifier, then do quite a bit of work to get it working. And DateTime itself isn’t very cheap. The solution presented is ugly, but it works, and it is fast.
public unsafe static string GetDefaultRavenFormat(this DateTime dt, bool isUtc = false) { string result = new string('Z', 27 + (isUtc ? 1 : 0)); var ticks = dt.Ticks; // n = number of days since 1/1/0001 int n = (int)(ticks / TicksPerDay); // y400 = number of whole 400-year periods since 1/1/0001 int y400 = n / DaysPer400Years; // n = day number within 400-year period n -= y400 * DaysPer400Years; // y100 = number of whole 100-year periods within 400-year period int y100 = n / DaysPer100Years; // Last 100-year period has an extra day, so decrement result if 4 if (y100 == 4) y100 = 3; // n = day number within 100-year period n -= y100 * DaysPer100Years; // y4 = number of whole 4-year periods within 100-year period int y4 = n / DaysPer4Years; // n = day number within 4-year period n -= y4 * DaysPer4Years; // y1 = number of whole years within 4-year period int y1 = n / DaysPerYear; // Last year has an extra day, so decrement result if 4 if (y1 == 4) y1 = 3; // If year was requested, compute and return it var year = y400 * 400 + y100 * 100 + y4 * 4 + y1 + 1; // n = day number within year n -= y1 * DaysPerYear; // Leap year calculation looks different from IsLeapYear since y1, y4, // and y100 are relative to year 1, not year 0 bool leapYear = y1 == 3 && (y4 != 24 || y100 == 3); int[] days = leapYear ? DaysToMonth366 : DaysToMonth365; // All months have less than 32 days, so n >> 5 is a good conservative // estimate for the month int month = n >> 5 + 1; // m = 1-based month number while (n >= days[month]) month++; // If month was requested, return it // Return 1-based day-of-month var day = n - days[month - 1] + 1; fixed (char* chars = result) { var v = _fourDigits[year]; chars[0] = v[0]; chars[1] = v[1]; chars[2] = v[2]; chars[3] = v[3]; chars[4] = '-'; v = _fourDigits[month]; chars[5] = v[2]; chars[5 + 1] = v[3]; chars[7] = '-'; v = _fourDigits[day]; chars[8] = v[2]; chars[8 + 1] = v[3]; chars[10] = 'T'; v = _fourDigits[(ticks / TicksPerHour) % 24]; chars[11] = v[2]; chars[11 + 1] = v[3]; chars[13] = ':'; v = _fourDigits[(ticks / TicksPerMinute) % 60]; chars[14] = v[2]; chars[14 + 1] = v[3]; chars[16] = ':'; v = _fourDigits[(ticks / TicksPerSecond) % 60]; chars[17] = v[2]; chars[17 + 1] = v[3]; chars[19] = '.'; long fraction = (ticks % 10000000); v = _fourDigits[fraction / 10000]; chars[20] = v[1]; chars[21] = v[2]; chars[22] = v[3]; fraction = fraction % 10000; v = _fourDigits[fraction]; chars[23] = v[0]; chars[24] = v[1]; chars[25] = v[2]; chars[26] = v[3]; } return result; }
We use the same general pattern that we used with etags as well, although here we are also doing a lot of work to figure out the right parts of the date. Note that we don’t have any allocations, and we again use the notion of a lookup table to all the pre-computed 4 digits number. That allows us to process 10,000,000 dates in just over 2 seconds (2,061 ms, to be exact). Or roughly 4,850 dates per millisecond. In other words, we are about 15% of the speed of the original implementation.
This code is ugly, in fact, the last few posts has contained pretty much ugly code, that is hard to understand. But it is significantly faster than the alternative, and what is even more important, those pieces of code are actually being used in RavenDB’s hot path. In other words, that means that we have actually seen significant performance improvement when introducing them to the codebase.
More posts in "Excerpts from the RavenDB Performance team report" series:
- (20 Feb 2015) Optimizing Compare – The circle of life (a post-mortem)
- (18 Feb 2015) JSON & Structs in Voron
- (13 Feb 2015) Facets of information, Part II
- (12 Feb 2015) Facets of information, Part I
- (06 Feb 2015) Do you copy that?
- (05 Feb 2015) Optimizing Compare – Conclusions
- (04 Feb 2015) Comparing Branch Tables
- (03 Feb 2015) Optimizers, Assemble!
- (30 Jan 2015) Optimizing Compare, Don’t you shake that branch at me!
- (29 Jan 2015) Optimizing Memory Comparisons, size does matter
- (28 Jan 2015) Optimizing Memory Comparisons, Digging into the IL
- (27 Jan 2015) Optimizing Memory Comparisons
- (26 Jan 2015) Optimizing Memory Compare/Copy Costs
- (23 Jan 2015) Expensive headers, and cache effects
- (22 Jan 2015) The long tale of a lambda
- (21 Jan 2015) Dates take a lot of time
- (20 Jan 2015) Etags and evil code, part II
- (19 Jan 2015) Etags and evil code, Part I
- (16 Jan 2015) Voron vs. Esent
- (15 Jan 2015) Routing
Comments
Have you taken into account leap seconds? TicksPerDay is not a constant value. Some days have a few more ticks.
I'd be interested to see how Noda Time does in comparison (particularly Noda Time 2.0, which stores everything rather differently).
If you have a reasonably-standalone benchmark that I could adapt to compare DateTime with Noda Time (either using Noda Time's own formatting or some equivalent of this code) feel free to ping me by email and I'll see what I can whip up.
@Jesús López .Net DateTime doesn't support leap seconds so that won't effect the number of ticks per day.
Most time libraries do not support leap seconds since there's no pattern to them, you'd need a list of them that's kept up to date on the machine even to resolve a UTC date.
This is why google and others went for a leap smear last time there was a leap second, so they could just continue as if leap seconds do not exist.
Jon, I mere tested that on a bunch of dates with the specified format, in a loop, nothing more exciting than that.
Okay, I couldn't find the code being quoted here, which stopped me from reproducing the version with the BCL, but I managed to adapt it to Noda Time, so I was able to test:
Results for formatting 10 million random values, in ms
x64: Standard Noda Time: ~3090 "Unsafe" Noda Time: ~660 BCL: ~10500
x86: Standard Noda Time: ~4900 "Unsafe" Noda Time: ~1450 BCL: ~15000
By changing the Noda Time code to always allocate a StringBuilder with 27 characters, the standard Noda Time value comes down a bit; that's slightly hard to generalize though, as keeping track of whether a format string is fixed-length or not is a pain. There's a simple option of "format a default value and use that" - I may give that a try.
This is using Noda Time 2.0, mind you - I'd expect Noda Time 1.x to be significantly slower.
Overall, I'm pretty pleased that Noda Time came that close to a very tailored formatter. Do you have any plans to do something similar for parsing, by the way?
Try doing direct digit computation instead of the table. It'll likely hit main memory less.
Why are there random places you do 5+1 instead of... 6?
Also why you do use the v temporary variable? I actually think that is far more unreadable since you change what v references multiple times. Why not just directly use:
chars[0] = _fourDigits[year][0]; chars[5] = _fourDigits[month][2];
Sure indexing into a multidimensional array sucks like this but if you're so concerned about unnecessary allocations that you won't do
var year = _fourDigits[year] var month = _fourDigits[month]
Why bother with 1 unnecessary variable allocation? Does this have to do something with the usage of fixed that you need to copy the variable address into the fixed block?
I'm curious as to what the impact of the unsafe pointer is. I understand there's a need to minimize allocations, but seeing as you already allocate a single string called 'result', what's the benefit of creating the pointer to it?
You said ... "the format provider needs to first parse the format specifier" ...
IMHO, that's one of the best reasons to use Noda Time's pattern-based parsing API. You declare the parser once, then you can re-use it as many times as you want.
A while back we chatted about deeper integration of Noda Time into the RavenDB internals. If you want to revisit that idea, I could send you a PR.
Chris, The 5+1 is intended to show that this is the same value. Note that this will be statically folded by the compiler, so the value would be a constant of 6. The use of the temp variable is to prevent multiple access to the array, which is something that we want to avoid.
Dan, Strings in C# are immutable. That means that we first has to build one, then generate it using string concat or StringBuilder. I know what is the size of my string upfront, and it is much cheaper for me to get the string, manually manipulate its memory content, then to use either of the other options
So, from a strictly technical perspective, the unsafe code could have been avoided by using an array of char instead of a string? But you didn't do that because you needed a string as the output?
Dan, No, it couldn't. When you do a new string(char[]), the value is _copied_, so that is an extra allocation we avoided.
Isn't there a risk that the result string is interned and the code in the fixed block will change the string memory for all consumers of that string memory?
Geert, The result string cannot be interned. We have just created it manually.
I've just sent Ayenda a zip file with a slightly more "extreme" optimization of the digit overwriting, using a long[] and an int[] instead of the original char[][] values... and then using pointer arithmetic to blat those into the string at the right place. I don't know whether the code will format properly here, so here's just a sample line of it:
((int)(chars + 8)) = _twoDigitsAsInt32s[dt.Day];
With that in place, and using Noda Time 2.0, on x64 the optimized code is about 20 times faster than the BCL format code (i.e. it takes about 5% of the BCL time).
Gah - even on its own, that line didn't copy properly. Let's try again using Markdown:
*((int*)(chars + 5)) = _twoDigitsAsInt32s[dt.Month];
In my experience mucking around with time is fraught with peril.
There was an excellent computerphile video all about what a nightmare they can be http://youtu.be/-5wpm-gesOY In particular Leap seconds might cause you an issue, in fact there is supposed to be one in july of this year http://hpiers.obspm.fr/iers/bul/bulc/bulletinc.dat
Comment preview