Excerpts from the RavenDB Performance team report: Dates take a lot of time

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (647) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1093) rss
raven (1459) rss
ravendb.net (545) rss
reviews (184) rss

2025
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Couchbase vs RavenDB Performance at Rakuten Kobo Whitepaper

Jan 21 2015

Excerpts from the RavenDB Performance team reportDates take a lot of time

time to read 7 min | 1302 words

RavenDB uses a lot of dates, from the last modified metadata on a document to the timestamp of an index or when a query was started or… you get the point, lots and lots of dates.

Dates in RavenDB are usually formatted in the following manner:

2015-01-15T00:41:16.6616631

This is done using the following date time format:

yyyy'-'MM'-'dd'T'HH':'mm':'ss.fffffff

This is pretty awesome. It generate readable dates that are lexicographically sorted. There is just one problem with that, this is really expensive to do. How expensive? Well, outputting 10 million dates using the following manner:

dateTime.ToString(Default.DateTimeFormatsToWrite, CultureInfo.InvariantCulture)

This takes 13.3 seconds, or just about 750 dates per millisecond. The costs here are partly the allocations, but mostly it is about the fact that the format provider needs to first parse the format specifier, then do quite a bit of work to get it working. And DateTime itself isn’t very cheap. The solution presented is ugly, but it works, and it is fast.

public unsafe static string GetDefaultRavenFormat(this DateTime dt, bool isUtc = false)
{
    string result = new string('Z', 27 + (isUtc ? 1 : 0));

    var ticks = dt.Ticks;

    // n = number of days since 1/1/0001
    int n = (int)(ticks / TicksPerDay);
    // y400 = number of whole 400-year periods since 1/1/0001
    int y400 = n / DaysPer400Years;
    // n = day number within 400-year period
    n -= y400 * DaysPer400Years;
    // y100 = number of whole 100-year periods within 400-year period
    int y100 = n / DaysPer100Years;
    // Last 100-year period has an extra day, so decrement result if 4
    if (y100 == 4) y100 = 3;
    // n = day number within 100-year period
    n -= y100 * DaysPer100Years;
    // y4 = number of whole 4-year periods within 100-year period
    int y4 = n / DaysPer4Years;
    // n = day number within 4-year period
    n -= y4 * DaysPer4Years;
    // y1 = number of whole years within 4-year period
    int y1 = n / DaysPerYear;
    // Last year has an extra day, so decrement result if 4
    if (y1 == 4) y1 = 3;
    // If year was requested, compute and return it
    var year = y400 * 400 + y100 * 100 + y4 * 4 + y1 + 1;

    // n = day number within year
    n -= y1 * DaysPerYear;
    // Leap year calculation looks different from IsLeapYear since y1, y4,
    // and y100 are relative to year 1, not year 0
    bool leapYear = y1 == 3 && (y4 != 24 || y100 == 3);
    int[] days = leapYear ? DaysToMonth366 : DaysToMonth365;
    // All months have less than 32 days, so n >> 5 is a good conservative
    // estimate for the month
    int month = n >> 5 + 1;
    // m = 1-based month number
    while (n >= days[month]) month++;
    // If month was requested, return it

    // Return 1-based day-of-month
    var day = n - days[month - 1] + 1;

    fixed (char* chars = result)
    {
        var v = _fourDigits[year];
        chars[0] = v[0];
        chars[1] = v[1];
        chars[2] = v[2];
        chars[3] = v[3];
        chars[4] = '-';
        v = _fourDigits[month];
        chars[5] = v[2];
        chars[5 + 1] = v[3];
        chars[7] = '-';
        v = _fourDigits[day];
        chars[8] = v[2];
        chars[8 + 1] = v[3];
        chars[10] = 'T';
        v = _fourDigits[(ticks / TicksPerHour) % 24];
        chars[11] = v[2];
        chars[11 + 1] = v[3];
        chars[13] = ':';
        v = _fourDigits[(ticks / TicksPerMinute) % 60];
        chars[14] = v[2];
        chars[14 + 1] = v[3];
        chars[16] = ':';
        v = _fourDigits[(ticks / TicksPerSecond) % 60];
        chars[17] = v[2];
        chars[17 + 1] = v[3];
        chars[19] = '.';

        long fraction = (ticks % 10000000);
        v = _fourDigits[fraction / 10000];
        chars[20] = v[1];
        chars[21] = v[2];
        chars[22] = v[3];

        fraction = fraction % 10000;

        v = _fourDigits[fraction];
        chars[23] = v[0];
        chars[24] = v[1];
        chars[25] = v[2];
        chars[26] = v[3];
    }

    return result;
}

We use the same general pattern that we used with etags as well, although here we are also doing a lot of work to figure out the right parts of the date. Note that we don’t have any allocations, and we again use the notion of a lookup table to all the pre-computed 4 digits number. That allows us to process 10,000,000 dates in just over 2 seconds (2,061 ms, to be exact). Or roughly 4,850 dates per millisecond. In other words, we are about 15% of the speed of the original implementation.

This code is ugly, in fact, the last few posts has contained pretty much ugly code, that is hard to understand. But it is significantly faster than the alternative, and what is even more important, those pieces of code are actually being used in RavenDB’s hot path. In other words, that means that we have actually seen significant performance improvement when introducing them to the codebase.

Tweet Share Share 18 comments

Tags:

Comments

21 Jan 2015
11:02 AM

Jesús López

Have you taken into account leap seconds? TicksPerDay is not a constant value. Some days have a few more ticks.

21 Jan 2015
11:13 AM

Jon Skeet

I'd be interested to see how Noda Time does in comparison (particularly Noda Time 2.0, which stores everything rather differently).

If you have a reasonably-standalone benchmark that I could adapt to compare DateTime with Noda Time (either using Noda Time's own formatting or some equivalent of this code) feel free to ping me by email and I'll see what I can whip up.

21 Jan 2015
12:37 PM

Chris Chilvers

@Jesús López .Net DateTime doesn't support leap seconds so that won't effect the number of ticks per day.

Most time libraries do not support leap seconds since there's no pattern to them, you'd need a list of them that's kept up to date on the machine even to resolve a UTC date.

This is why google and others went for a leap smear last time there was a leap second, so they could just continue as if leap seconds do not exist.

21 Jan 2015
16:02 PM

Ayende Rahien

Jon, I mere tested that on a bunch of dates with the specified format, in a loop, nothing more exciting than that.

21 Jan 2015
20:06 PM

Jon Skeet

Okay, I couldn't find the code being quoted here, which stopped me from reproducing the version with the BCL, but I managed to adapt it to Noda Time, so I was able to test:

Standard Noda Time pattern formatting
"Unsafe" formatting as above, but using Noda Time (the code is simpler btw :)
BCL formatting

Results for formatting 10 million random values, in ms

x64: Standard Noda Time: ~3090 "Unsafe" Noda Time: ~660 BCL: ~10500

x86: Standard Noda Time: ~4900 "Unsafe" Noda Time: ~1450 BCL: ~15000

By changing the Noda Time code to always allocate a StringBuilder with 27 characters, the standard Noda Time value comes down a bit; that's slightly hard to generalize though, as keeping track of whether a format string is fixed-length or not is a pain. There's a simple option of "format a default value and use that" - I may give that a try.

This is using Noda Time 2.0, mind you - I'd expect Noda Time 1.x to be significantly slower.

Overall, I'm pretty pleased that Noda Time came that close to a very tailored formatter. Do you have any plans to do something similar for parsing, by the way?

22 Jan 2015
04:33 AM

Old Programmer

Try doing direct digit computation instead of the table. It'll likely hit main memory less.

22 Jan 2015
17:46 PM

Chris Marisic

Why are there random places you do 5+1 instead of... 6?

Also why you do use the v temporary variable? I actually think that is far more unreadable since you change what v references multiple times. Why not just directly use:

chars[0] = _fourDigits[year][0]; chars[5] = _fourDigits[month][2];

Sure indexing into a multidimensional array sucks like this but if you're so concerned about unnecessary allocations that you won't do

var year = _fourDigits[year] var month = _fourDigits[month]

Why bother with 1 unnecessary variable allocation? Does this have to do something with the usage of fixed that you need to copy the variable address into the fixed block?

22 Jan 2015
21:48 PM

Dan

I'm curious as to what the impact of the unsafe pointer is. I understand there's a need to minimize allocations, but seeing as you already allocate a single string called 'result', what's the benefit of creating the pointer to it?

23 Jan 2015
00:39 AM

Matt Johnson

You said ... "the format provider needs to first parse the format specifier" ...

IMHO, that's one of the best reasons to use Noda Time's pattern-based parsing API. You declare the parser once, then you can re-use it as many times as you want.

A while back we chatted about deeper integration of Noda Time into the RavenDB internals. If you want to revisit that idea, I could send you a PR.

23 Jan 2015
06:15 AM

Ayende Rahien

Chris, The 5+1 is intended to show that this is the same value. Note that this will be statically folded by the compiler, so the value would be a constant of 6. The use of the temp variable is to prevent multiple access to the array, which is something that we want to avoid.

23 Jan 2015
06:16 AM

Ayende Rahien

Dan, Strings in C# are immutable. That means that we first has to build one, then generate it using string concat or StringBuilder. I know what is the size of my string upfront, and it is much cheaper for me to get the string, manually manipulate its memory content, then to use either of the other options

23 Jan 2015
13:28 PM

Dan

So, from a strictly technical perspective, the unsafe code could have been avoided by using an array of char instead of a string? But you didn't do that because you needed a string as the output?

23 Jan 2015
13:32 PM

Ayende Rahien

Dan, No, it couldn't. When you do a new string(char[]), the value is _copied_, so that is an extra allocation we avoided.

23 Jan 2015
13:43 PM

Geert

Isn't there a risk that the result string is interned and the code in the fixed block will change the string memory for all consumers of that string memory?

23 Jan 2015
13:46 PM

Ayende Rahien

Geert, The result string cannot be interned. We have just created it manually.

26 Jan 2015
20:05 PM

Jon Skeet

I've just sent Ayenda a zip file with a slightly more "extreme" optimization of the digit overwriting, using a long[] and an int[] instead of the original char[][] values... and then using pointer arithmetic to blat those into the string at the right place. I don't know whether the code will format properly here, so here's just a sample line of it:

((int)(chars + 8)) = _twoDigitsAsInt32s[dt.Day];

With that in place, and using Noda Time 2.0, on x64 the optimized code is about 20 times faster than the BCL format code (i.e. it takes about 5% of the BCL time).

26 Jan 2015
20:05 PM

Jon Skeet

Gah - even on its own, that line didn't copy properly. Let's try again using Markdown:

*((int*)(chars + 5)) = _twoDigitsAsInt32s[dt.Month];

27 Jan 2015
13:43 PM

tim

In my experience mucking around with time is fraught with peril.

There was an excellent computerphile video all about what a nightmare they can be http://youtu.be/-5wpm-gesOY In particular Leap seconds might cause you an issue, in fact there is supposed to be one in july of this year http://hpiers.obspm.fr/iers/bul/bulc/bulletinc.dat

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Excerpts from the RavenDB Performance team reportDates take a lot of time

More posts in "Excerpts from the RavenDB Performance team report" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Excerpts from the RavenDB Performance team report" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication