Making code faster: Streamlining the output

architecture (616) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1088) rss
raven (1457) rss
ravendb.net (541) rss
reviews (184) rss

2025
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB - High-Performance NoSQL Document Database

Nov 21 2016

Making code fasterStreamlining the output

time to read 3 min | 456 words

After looking at the profiler results, I realized that we are actually spending a considerable amount of time just writing the output to a file. That didn’t really matter when our code run in 30+ seconds, spending another 100 – 200 ms to write the results was just noise, but when our code is doing that in under a second, that a considerable cost.

I’m running this code on a different machine, so we can’t directly compare. The performance of the initial version is:

38,478 ms and allocated 7,612,741 kb with peak working set of 874,660 kb

And the speed of the latest version is:

842 ms and allocated 208,435 kb with peak working set of 375,452 kb

So we are 45 times faster than the initial version.

The problem is that doing this in parallel takes quite a lot and mask some inefficiencies, so I decided to change it back to using a single threaded approach. Which gives:

1,498 ms and allocated 123,787 kb with peak working set of 319,436 kb

Merely 25 times faster than the original version.

And now let us focus on the output.

This is pretty simple code, but it hides a lot of inefficiencies, in particular, it is doing a lot of allocations as it format the string. We can do much better.

Merely changing the WriteLine to:

output.WriteLine($"{entry.Value.Id} {entry.Value.DurationInTicks}");

Saved us close to 200 ms (!), so there is a lot of space to improve here. Again, this is mostly an issue of creating highly specific code to solve this exact scenario. Here is what I did:

I wrote a simple function to format the number into a buffer, then change the summary line to write a single line into a prepared buffer (and skip all the static stuff), and write the to the file file in one shot.

And the results are:

1,191 ms and allocated 16,942 kb with peak working set of 311,432 kb

You might have noticed that I have two copies of the WriteFormattedInt, this is to skip the implicit cast to long, and yes, it matters, by about 50 ms in my tests. But this version also reduces the number of allocations we have by over 100 MB! So this is great.

And here are the profiler results on analyzing this method:

This function is now almost 7 times faster! That is pretty awesome, and even talking about single threaded performance, we are looking at 32 times better than the original version.

Trying the parallel version give me:

731 ms and allocated 101,565 kb with peak working set of 381,224 kb

And a total improvement of 52 times! But we can do even more… I’ll talk about it in the next post.

Tweet Share Share 17 comments

Tags:

Comments

21 Nov 2016
10:33 AM

OmariO

You should have written it in C from the very beginning :)

21 Nov 2016
11:39 AM

JustPassingBy

Ah, yes. This is the know-how I was lacking. I don't know enough about working with bytes to really start writing my own implementation. I will give this output implementation a go later.

Just out of curiosity, are you looking to squeeze out every last drop of performance? Are time, allocation, and peaking working set of equal importance? Or would you sacrifice an increase in allocations for a reduction in time?

Really enjoying this series. Looking forward to the next post.

21 Nov 2016
11:52 AM

Oren Eini

JustPassingBy, I'm mostly interested in time, to be honest. As a side effect, allocations & peak working set tend to drop as well

21 Nov 2016
11:57 AM

JustPassingBy

@Oren: I see. Also, the same snippet of code (optimized-output-summary.cs) is repeated twice in your post.

21 Nov 2016
12:35 PM

Mircea Chirea

I'd like to add that this "Making code faster" series is pretty useless for the average developer working on the usual application. In a database where you want it to be as fast as possible regardless of how much work it is to implement and how hard it is to maintain, in your usual application you should never write unsafe code.

This series would be much more useful if you showed us how to squeeze that extra bit of performance without dropping to unsafe/unmanaged code. For example, when you said that a lot of time is spent for formatting each line, the optimization was to write bytes directly. Instead, you could have showed how to optimize that without unsafe code, then with unsafe but fragile code.

21 Nov 2016
12:51 PM

Thomas Levesque

I think there's a bug in your implementation: TimeSpan.Hours isn't the total number of hours, it's modulo 24. If the TimeSpan is more is e.g. 27 hours, the Hours property will return 3, not 27. You should take the Days property into account too. In a scenario like this, it's very likely that the total number of hours for a given car will exceed 24.

21 Nov 2016
14:43 PM

JustPassingBy

Mircea Chirea

I'd like to add that this "Making code faster" series is pretty useless for the average developer working on the usual application.

I disagree wholeheartedly Mircea. I already know where I am going to use the knowledge I have gained during this series. While I am not going to write unsafe code and get into nitty gritty with pointers. I can certainly appreciate the focus on speed and performant code.

And the approach you outline, is the exact approach I am taking. I do not want to write unsafe code, but I do want it blazing fast. Both approaches are possible.

21 Nov 2016
17:44 PM

dhasenan

For the next step, I'd calculate the necessary length of the file, memory map it, and then write to it directly. Then I'm not incurring context switches every 21 bytes.

Another option is to preallocate the entire byte array in advance, but I wouldn't expect that to be as fast as memory mapping. (In a similar vein, I thought about using a giant StringBuilder, but the output is ASCII, so that would incur a large encoding cost.)

21 Nov 2016
18:28 PM

Urs

Oren, how do you actually measure allocated kb & peak working set? by looking at ProcessMonitor output?

21 Nov 2016
19:00 PM

JustPassingBy

Down to 347 ms now, thanks to this post. Slightly different implementation of WriteOutput. I did play around with BitConverter.GetBytes() but I think I am missing something in my knowledge because it did not return what I had expected.

Targeting different platforms changes the time taken by ~50ms too:-

// Platform Target: x64
Took: 347 ms
Allocated: 1,105,100 kb
Peak Working Set: 355,052 kb

// Platform Target: x86
Took: 397 ms
Allocated: 835,780 kb
Peak Working Set: 353,056 kb

21 Nov 2016
19:02 PM

JustPassingBy

@Urs: It was in the first post in the series. Bottom of this gist.

21 Nov 2016
21:12 PM

peter

@Mircea >>pretty useless for the average developer working on the usual application. I agree - that is why they remain "average" doing "usual" stuff.

22 Nov 2016
13:03 PM

Oren Eini

Mircea, In most applications, you have a few hot spots that are critical for performance. Having the tools in place to handle that is quite important. In one particular example, I saw a painful hotspot in a tree view generation that had a lot of elements, and fixing that made the entire application MUCH faster as far as the user was concerned, and that was worth whatever we did to make it happen.

Also, remember that just understanding what is going on it useful when you need to optimize. Admittedly, not many people care about micro performance the way we do, but when you have a hot spot, knowing what it is actually doing and having the tools to resolve it is important.

22 Nov 2016
13:04 PM

Oren Eini

Urs, Look at the actual full code sample, it shows how to do that via the API

22 Nov 2016
13:05 PM

Oren Eini

Peter, Very good point regarding avg. dev.

22 Nov 2016
13:29 PM

Tuschinski

Mircea Chirea: if you want to point that "unsafe" code is fragile, you are assuming that a whole world of code lines written in the world (banking solutions, per example) are just safe code... Sorry to tell that this type of code splits boys from men, in real world applications and development.

23 Nov 2016
07:25 AM

Mircea Chirea

JustPassingBy, peter,

I am not saying that this series or this post in particular are bad, on the contrary, they are quite helpful and very interesting - quite eye-opening to learn how you could squeeze every last drop of performance out of a simple task (a very common one I might add, parsing some text in a strict format). I am saying that this level of optimization is not applicable on most cases, except the few hot spots in an application - just like Oren said in his reply to me. However, there are many more places in a typical application that could benefit from optimizations without dropping all the way down to unsafe code.

The next post about replacing a dictionary with an array because the format of the keys allows it is exactly the kind of posts I wish this series had more of :)

Tuschinski,

I don't suggest that just because unsafe code is fragile it is also bad or broken. I mean it should be the last resort in some specific situations when writing in C#. Banking solutions written in C++ (for example) aren't necessarily broken because they aren't written in C# or heck even JavaScript - there are other considerations there. For most applications written in C# there will be few parts with unsafe code (the hot spots), where it will absolutely be worth the effort regardless of the fact that the code will be harder to mantain; however other parts could still benefit from optimizations - see what I said above and Oren's next post.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Making code fasterStreamlining the output

More posts in "Making code faster" series:

Comments

Mircea Chirea

I'd like to add that this "Making code faster" series is pretty useless for the average developer working on the usual application.

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Making code faster" series:

Comments

Mircea Chirea

I'd like to add that this "Making code faster" series is pretty useless for the average developer working on the usual application.

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication