Making code faster: Specialization make it faster still

architecture (616) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1088) rss
raven (1457) rss
ravendb.net (541) rss
reviews (184) rss

2025
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB - High-Performance NoSQL Document Database

Nov 23 2016

Making code fasterSpecialization make it faster still

time to read 3 min | 403 words

Okay, at this point we are really pushing it, but I do wonder if we can get it faster still?

So we spend a lot of time in the ParseTime call, parsing two dates and then subtracting them. I wonder if we really need to do that?

I wrote two optimizations, once to compare only the time part if they are the same, and the second to do the date compare in seconds, instead of ticks. Here is what this looks like:

Note that we compare the first 12 bytes using just 2 instructions (by comparing long & int values), since we don’t care what they are, only that they are equal. The result:

283 ms and allocated 1,296 kb with peak working set of 295,200 kb

So we are now 135 times faster than the original version.

Here is the profiler output:

And at this point, I think that we are pretty much completely done. We can parse a line in under 75 nanoseconds, and we can process about 1 GB a second on this machine ( my year old plus laptop ).

We can see that the skipping the date compare for time compare if we can pay off in about 65% of the cases, so that is probably a nice boost right there. But I really can’t think of anything else that we can do here that can improve matters in any meaningful way.

For comparison purposes.

Original version:

38,478 ms
7,612,741 kb allocated
874,660 kb peak working set
50 lines of code
Extremely readable
Easy to change

Final version:

283 ms
1,296 kb allocated
295,200 kb peak working set
180 lines of code
Highly specific and require specialize knowledge
Hard to change

So yes, that is 135 times faster, but the first version took about 10 minutes to write, then another half an hour to fiddle with it to make it non obviously inefficient. The final version took several days of careful though, analysis of the data and careful optimizations.

Tweet Share Share 18 comments

Tags:

Comments

23 Nov 2016
10:49 AM

Carsten Hansen

Several days to save one minute. It reminds me of the Terence Parr motto:

As antlr creator Terence Parr says "Why Program by Hand in Five Days what You Can Spend Five Years of Your Life Automating?" Uh, yes, and in the rest of the talk he answers why automation is good. You didn't just stop at the title, did you?

See https://news.ycombinator.com/item?id=8293390 http://parrt.cs.usfca.edu/

I look forward to the automated optimizer :-)

23 Nov 2016
12:12 PM

Oren Eini

Carsten, a) This didn't actually take several days to write, you understand. b) And for hot paths, this is more than worth it.

23 Nov 2016
12:46 PM

svick

Since at this point, making the code faster by few instructions could make a difference, have you considered using & instead of && in the condition?

It would save one conditional jump, which, depending on the distribution of the data, could be hard for the CPU to predict.

23 Nov 2016
12:47 PM

JustPassingBy

I am glad you mentioned the cost of maintainability and readability. For me personally those two factors are incredibly important. Even more so when you work in a company with a lot of developers. Personally, I felt the solution that was done in around 3 seconds was the happy midpoint. It was readable and maintainable both to a very high degree but saw a massive reduction in time taken to execute the code.

23 Nov 2016
12:53 PM

svick

Probably more importantly, why are you comparing the first 12 bytes? As far as I can tell, that includes the first digit of the time, so if the date is the same, but the first digit of time isn't, you're unnecessarily using the slow path. This means that casting to short* instead of int* could help by increasing the fraction of cases that use the fast path.

23 Nov 2016
13:00 PM

ren

Is this correct that most speed up was obtained with MemoryMappedFile.CreateFromFile thing? Can anyone explain what is this and how it works? I've read a little bit but still don't get why is it that fast, after all it has to read 276 Mb from disk and it does it under a second? How?

23 Nov 2016
13:06 PM

ren

Another question: what would happen if file would be larger than ram size?

23 Nov 2016
13:10 PM

Alex Davidson

Memory-mapping a file takes advantage of the OS's virtual memory capabilities, so the OS handles the business of reading pages into memory and flushing/evicting those which are no longer needed. The size of physical RAM is largely irrelevant since only a part of the file is mapped at a time, but if RAM is too tight then the OS will have to page more often which will put more load on the IO subsystem.

The IO speed in this case is down to the use of a solid-state disk. 276MB would take rather longer to read from a conventional spinning disk.

23 Nov 2016
13:34 PM

Alex Davidson

To clarify, the main benefit of memory-mapping in this case (I think) is that it drastically reduces the number of syscalls and memory copies required to access the data.

Reading a typical stream:

OS reads from disk into a buffer.
Application library code requests data from OS, which involves copying from the OS buffer into the application's buffer (typically 4K for .NET buffered streams?). This also involves a syscall so it might cause a context switch.
Application copies from its buffer into smaller structures.

And that's assuming that the application is using a buffered stream. Otherwise, pretty much every tiny read could result in a syscall.

Reading a memmapped file effectively bypasses the second step, mapping the page read from disk directly into the application's memory space. By using unsafe code and dealing with the bytes in-place, we can skip the third step too.

23 Nov 2016
14:05 PM

ren

Ok, thanks, that make more sense now. Also @Ayende just a check: I've found here that super fast speed might be because of windows cache: http://stackoverflow.com/questions/26456195/why-is-reading-from-a-memory-mapped-file-so-fast#comment41553531_26456195 Is this taken into account?

23 Nov 2016
14:30 PM

ren

Even for ssd this looks too good to be true:

http://unix.stackexchange.com/questions/88796/are-these-throughput-numbers-typical-for-an-ssd

23 Nov 2016
14:39 PM

ren

Oops, ignore the last comment, it seems that 600Mb\s for ssd on windows is possible...

23 Nov 2016
15:34 PM

Oren Eini

Svick, & is supposed to be more expensive. It forces us to compare both ends, while && will short circuit.

However, the cost of doing the comparison vs. the conditional jump is probably worth testing.

23 Nov 2016
15:39 PM

Oren Eini

Svick, You are correct, I should have used a short here, not an int

23 Nov 2016
15:43 PM

Oren Eini

ren, Yes, we don't really take I/O into account here, since we are mostly reading from the system cache.

23 Nov 2016
21:19 PM

Johnny Lee

It looks like your code is fast but incorrect.

Original article said:

The first value is the entry time, the second is the exit time and the third is the car id.

But line 22 of the Final Version code seems to swap start and end):

        public static void Parse(byte* buffer, out int id, out int duration)
        {
            duration = DiffTimesInSecond(buffer + 20, buffer); // <<< Line 22
            id = ParseInt(buffer + 40, 8);
        }

        private static int DiffTimesInSecond(byte* start, byte* end)
        {

24 Nov 2016
10:30 AM

Lasse Vågsæther Karlsen

I see that you're using the Trace mode of dotTrace to do your performance profiling. This adds overhead to each called method to track the number of calls and for "free" methods, like property getters and such, it adds a not-so-insignificant overhead. I wonder what your performance profile results would look like if you switched to sampling, would you see the same results?

24 Nov 2016
10:45 AM

Oren Eini

Lasse, It is possible, yes, but I found that this is much clearer for me to track down costs and optimize.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Making code fasterSpecialization make it faster still

More posts in "Making code faster" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Making code faster" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication