Making code faster: The obvious costs

architecture (623) rss
bugs (451) rss
community (382) rss
databases (481) rss
design (899) rss
development (654) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1104) rss
raven (1471) rss
ravendb.net (557) rss
reviews (184) rss

2025
- October (3)
- September (10)
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Nov 14 2016

Making code fasterThe obvious costs

time to read 3 min | 427 words

In my previous post, I presented a small code sample and asked how we can improve its performance. Note that this code sample has been quite maliciously designed to be:

Very small.
Clear in what it is doing.
The most obvious way to do it.
Highly inefficient.
Mislead people into non optimal optimization paths.

In other words, if you don’t get what is going on, you’ll not be able to get the best out of it. And even if you do, it is likely that you’ll try to go in a “minimum change of the code” that isn’t going to be doing as much for performance.

Let us look at the code again:

The most obvious optimization is that we are calling _line.Split() multiple times inside the Record class. Let us fix that:

This trivial change reduce the runtime by about 5 seconds, and saved us 4.2 GB of allocations. The peak working set increased by about 100 MB, which I assume is because the Record class moving from having a single 8 bytes field to having three 8 bytes field.

The next change is also pretty trivial, let us remove the File.ReadAllLines() in favor of calling File.ReadLines(). This, surprisingly enough, has had very little impact on performance.

However, the allocations dropped by 100 MB, and the working set dropped to 280 MB, very much near the size of the file itself.

This is because we no longer have to read the file into an array, and hold on to this array for the duration of the program. Instead, we can collect the garbage from the lines very efficiently.

This conclude the obvious stuff, and we managed to gain a whole 5 seconds of performance improvement here. However, we can do better, and it is sort of obvious, so I’ll put it in this post.

As written this code is single threaded. And while we are reading from a file, we are still pretty much CPU bound, why not use all the cores we have?

As you can see, all we had to do was add AsParallel(), and the TPL will take care of it for us.

This gives us a runtime of 9 seconds, allocations are a bit higher (3.45GB up from 3.3 GB) but the peak working set exceeded 1.1GB. Which makes a lot of sense.

Now, we are now standing at 1/3 of the initial performance, which is excellent, but can we do more? We’ll cover that in the next post.

Tweet Share Share 18 comments

Tags:

Comments

14 Nov 2016
10:08 AM

Uri

@Arseny Kapoulkine from the previous post, got 135ms with c++ implementation, so when you need micro-optimizations and performance, there is no way but going deep and close to the hardware instructions.

the main lesson I learned from it, that there are no free meals, yes .net is an awesome language and my favorite one. its very easy to write code and fast, but you pay for it with performance and allocations.

I would expect from the .net framework to be more efficient and much closer to the c++ performance, but this is for another post.

again - thanks for great learning and sharing.

14 Nov 2016
13:18 PM

JustPassingBy

Fun little challenge, this what I came up with based on your previous post;

var dict = new ConcurrentDictionary<string, TimeSpan>();

Parallel.ForEach(File.ReadLines("../../data.txt"), line =>
{
    var parts = line.Split(' ');
    var id = parts[2];
    var entryStamp = DateTime.Parse(parts[0]);
    var exitStamp = DateTime.Parse(parts[1]);
    var diff = exitStamp - entryStamp;
    dict.AddOrUpdate(id, diff, (key, value) => value += diff);
});

var lines = dict.Select(pair => pair.Key + ": " + pair.Value).ToList();
File.AppendAllLines("../../summary.txt", lines);

12 seconds with 74MB.

14 Nov 2016
13:35 PM

Neil

I'm not too familiar with the insides of the TPL stuff. On the face of it, this is reading lines from the file, then each line is being processed by a different CPU core (slight simplification, I know). Given that you need to group the data by ID, how does .AsParallel() help here? If you were calling AsParallel() on the grouped output, so the calc of duration and sum of durations is parallelised, I'd understand the value of it.

14 Nov 2016
13:42 PM

Ryan Heath

@Neil, the grouping is done by using the (concurrent)dictionary.

// Ryan

14 Nov 2016
15:05 PM

ren

@Uri that's why super intense optimizations in C# look unnatural - C# is for relaxed programmers who don't care if it's 10 sec or 1 sec :)

14 Nov 2016
15:27 PM

Federico Lois

@uri managing allocations is key for high performance code in whatever language you are working on. The bad practice of not controlling allocations, that is prevalent in .Net developer culture, is the culprit not the platform. You can control the assembler that gets emitted by the JIT in more or less the same fashion you do that with the C++ compiler, as long as you are willing to deal with nasty code (a normal thing in C/C++). Achieving more than decent performance out of the 1% of code that matters, and fast development for the rest is not a tradeoff I would dismiss lightly.

14 Nov 2016
17:32 PM

Fabrice

Using producer/consumer pattern: Took: 6.792 ms and allocated 3.128.383 kb with peak working set of 99.124 kb Of course we can't really compare performance as it depends on hardware

14 Nov 2016
19:30 PM

Konstantinos Giannousis

@ren But this is so not the case. A well optimised C++ code, might be always a bit faster than a well optimised C# code, but also this well optimised C# piece of code will completely blow away a not so optimised c++ code. And the c# segment will be quite a bit smaller, meaning easier to experiment with. Thats the beauty of C# compared with C++.

14 Nov 2016
19:46 PM

Michael

So you guys think C# could not be fast. Ok, then. Yesterday I got about about 440ms. After studying @Arseny's and @alex's solutions from yesterdays blog entry I made another attempt.

I'm using 8 threads now. The main threads loads the data chunkwise and then the 7 remaining threads crunch the data while the main thread loads the next chunk of data. At the end all the data from the different threads is added up and written to output. Also I switched from storing Ticks to seconds (like in @alex's solution). Also I set process priority to realtime in order to get more consistent results.

And here is the result: Took: 157 ms and allocated 67.549 kb with peak working set of 80.660 kb

Who says c# can't be as fast as C++. ;-) Although to be fair @Arseny's solution would surely still run faster on my computer since I have a 4C/8T and he used a 2C/4T.

And by the way: Absolutely no unsafe code.....

Source code: http://pastebin.com/PpJkB9gT

14 Nov 2016
21:07 PM

Fabrice

@Michael This is really impressive ! But how much time you took to write this down ? Because I sticked to the 1h interview time ... This is beautiful as a "demo" of how performance can be achieved (it took ~600ms on my laptop), but I think maintenance of this code is impossible: only a few dev (and I don't count myself in) could understand every bit, and by reimplementing everything (even datetime subtract) I think you open the door to some subtle bugs ( For example, I know it was not requested, but if we had the handle change in daylight saving time between in and out hours)

15 Nov 2016
05:51 AM

fschwiet

The original problem stated "how much time a car spent in the lot based on this file." If you know that car id ahead of time you could skip the DateTime parsing for all the other lines.

15 Nov 2016
13:35 PM

Oren Eini

Fabrice, Can you show the sample code?

15 Nov 2016
13:44 PM

Fabrice

@Oren: original code : http://pastebin.com/cv9XuJ13

I then worked a bit less that 1h hour more on this, using some code from @Michael (for parsing date & id) but keeping it quite simple, still with producer/consumer pattern and changing how it read the file :
Took: 4.043 ms and allocated 594.374 kb with peak working set of 170.656 kb --> http://pastebin.com/CaVzpBXB

16 Nov 2016
08:56 AM

Michael

@Fabrice: The version above was my 20th iteration and it took me three evening to get there. After one hour (my 5th iteration) I was @ 1.300 ms and about 70 mb peak memory. Take a look at the comments section of the first blog post, there is a short explanation of what I did. The (first) version I posted there uses DateTime and is still ok maintainable.
After seeing that Arseny got sub 200 ms with C++ I wondered how fast I could get and had some spare time, so I started experimenting. I tried out a lot of things that I haven't done before or haven't done in some time (like System.Numerics.Vector or struct unions). For me it was a fun challenge and a good opportunity to revisit some thing and try out new things.
Regarding maintainability: Most of the time you need maintainable and expressive. But sometimes you just need AFAP (as fast as possible). :-)

18 Nov 2016
01:08 AM

@Michael - It is inspiring to see the level of effort you put into your solution for the simple pleasure of a mental exercise and sharing knowledge! And thanks to @Oren for providing the platform and the fodder!

Given that, I hesitate to report that sDaysPerMonthLeapYear has the same data as sDaysPerMonth.

18 Nov 2016
07:39 AM

Michael

@Ed: Thanks for pointing that out! The second number in sDaysPerMonthLeapYear should of course read 29. My Bad!

21 Nov 2016
18:04 PM

vinod padmanaban

I have tried out the above code in my laptop with file 2.25 gb data with same schema ,but i am getting an "Exception of type 'System.OutOfMemoryException' was thrown

22 Nov 2016
13:13 PM

Oren Eini

Vinod, Yes, that would do it, it can handle up to a bit less than 2 GB as currently written

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Making code fasterThe obvious costs

More posts in "Making code faster" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Making code faster" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication