Making code faster: I like my performance unsafely

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (646) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1092) rss
raven (1459) rss
ravendb.net (544) rss
reviews (184) rss

2025
- August (5)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Think inside the database - RavenDB with native GenAI integration

Nov 17 2016

Making code fasterI like my performance unsafely

time to read 2 min | 293 words

After introducing the problem and doing some very obvious things, then doing some pretty non obvious things and even writing our own I/O routines we ended up with an implementation that is 17 times faster than the original one.

And yet we can still do better. But at this point, we need to go native and use a bit of unsafe code. We’ll start by implementing a naïve native record parser, like so:

This is pretty much the same as before, but now we are dealing with pointers. How do we use this?

We memory map the file, and then we go over it, doing no allocations at all throughout.

This give us 1 second to process the file, 126 MB allocated (probably in the dictionary) and a peak working set of 320 MB.

We are now 30 times faster than the initial implementation, and I wonder if I can do more… ? We can do that by going parallel, which give us the following code:

This is pretty ugly, but basically we are using 4 threads to run it, and we are giving each one of them a range of the file, as well as their own dedicated records dictionary. After we are done, we need to merge the records to a single dictionary, and that is it.

Using this approach, we can get down to 663 ms run time, 184 MB of allocations and 364 MB peak working set.

So we are now about 45(!) times faster than the original version. We are almost done, but on my next post, I’m going to go ahead and pull the profiler and see if we can squeeze anything else out of it.

Tweet Share Share 14 comments

Tags:

Comments

17 Nov 2016
11:21 AM

JustPassingBy

The first post in the series states that this is used as an interview question. What sort of solution is reasonable in the time frame given? I think there comes a point where you sacrifice readability and maintainability for speed and performance.

17 Nov 2016
11:46 AM

Oren Eini

JustPassingBy, Just changing the linq statement to a for loop with a dictionary would do more than x2 performance boost. Note that we are now in the < 1sec range.

In an interview, being able to do the changes in the obvious section would be good enough, most times, and being able to discuss the options of doing the other things.

17 Nov 2016
12:05 PM

Stano

Hello Oren. At first, I want to note, that your solution in the first post, took on my computer 44 seconds with 4,6 GB of allocations and peak working set of 1,3 GB. I played wit it and come up with a solution, which is very similar to one in your previous post. I just did not use array of chars, but array of bytes, I use just ints, not longs, and Dictionary<int, int> instead of FastRecord. https://gist.github.com/satano/f390189ee4c88dfeadac3fce98eab1c7](https://gist.github.com/satano/f390189ee4c88dfeadac3fce98eab1c7) It now runs 2 seconds, with allocations of 65 MB and peak working set of 39 MB.

I tried this your unsafe solution. The allocation and peak working set is similar to yours (which is higher than mine). But I was surprised, that on my computer, it is not faster. Instead it is about 100 - 200 ms slower than my solution. Do you have any idea why could be this?

17 Nov 2016
13:06 PM

Oren Eini

Stano, Are you memory starved? It might be related to paging / swapping or something like that

17 Nov 2016
15:29 PM

JustPassingBy

@Oren: I get that is it super performant now, but for me personally that comes at the sacrifice of the code readability and maintainability. Nonetheless, this series has taught me a lot - not sure I would have thought about working down to the byte level in your previous post. But taking your ideas and writing a slightly different implementation I managed:-

Took: 658 ms
Allocated: 911,054 kb
Peak Working Set: 347,576 kb

Code: https://gist.github.com/anonymous/7ec170ae4ee70125b391a674faa03c28

17 Nov 2016
18:54 PM

JustPassingBy

Managed to shave it down to 500ms:-

Took: 500 ms
Allocated: 933,545 kb
Peak Working Set: 371,340 kb

Code: https://gist.github.com/anonymous/e5607eecf2ff37b8874951827fb57186

Looking forward to the next post.

17 Nov 2016
19:09 PM

Oren Eini

JustPassingBy,

You have this:

https://gist.github.com/anonymous/e5607eecf2ff37b8874951827fb57186#file-byteitharder-cs-L32

Which is a single array that is concurrently accessed by multiple threads in a thread unsafe manner.

You are also assuming that the number of records is the max id size, which is false.

17 Nov 2016
21:28 PM

JustPassingBy

@Oren:

Which is a single array that is concurrently accessed by multiple threads in a thread unsafe manner.

Yeah, that's what I thought. But I switched to that implementation 3-4 iterations ago and it has yet to break on me. I must have used that same logic at least a 1000 times now and it hasn't misbehaved. Maybe I am just getting lucky?

You are also assuming that the number of records is the max id size, which is false.

Not sure where you see this?

17 Nov 2016
21:36 PM

Oren Eini

JustPassingBy, That really depends on what you mean by fail on you. This shouldn't generate an error, just wrong results.

As for the error in the ids, open the file, copy the first 10 lines and run your code on just those. You'll see it.

17 Nov 2016
22:00 PM

JustPassingBy

@Oren:

That really depends on what you mean by fail on you. This shouldn't generate an error, just wrong results.

Using your solution from the first post in the series as the correct output and comparing the last 20 or so results from my current solution they all match.

As for the error in the ids, open the file, copy the first 10 lines and run your code on just those. You'll see it.

Yep. I completely missed that. Thanks for highlighting it.

17 Nov 2016
23:30 PM

alex

If, in the context of a practical application, the given input file is processed only once, run-time would be dominated by I/O and running things in parallel would not make much difference. But, because we are running the solutions multiple times and the file contents is fully in page cache, parallel processing would seem to be a good choice to obtain the "best benchmark numbers".

I created a 'no input validation' performed variant of the solution posted before in introducing the problem, that can be run in a single thread or in parallel. It either uses a 2-level (jagged array) "trie" or a 3-level one for id/duration storage. The 2-level one is a tiny bit faster, the 3-level one uses a bit less memory. Running the different implementations ("Reference" being the linqy code in the input ZIP archive):

Reference                                   44,327 ms and allocated 4,585,691 kb with peak working set of 1,267,372 kb
2-Level Trie with Validation                277 ms and allocated 1,856 kb with peak working set of 295,672 kb
2-Level Trie without Validation             213 ms and allocated 960 kb with peak working set of 295,556 kb
2-Level Trie without Validation             227 ms and allocated 864 kb with peak working set of 295,584 kb
2-Level Trie without Validation Parallel    136 ms and allocated 6,625 kb with peak working set of 304,224 kb
3-Level Trie without Validation Parallel    141 ms and allocated 6,536 kb with peak working set of 304,308 kb

So for the wall-time best case the improvement is a factor of 319, and for memory allocation the best case improvement is a factor of 5211. Performing a significant amount of validation on the input only adds around 50 ms or 20% to the run time, so that would be worth it in my opinion.

Updated code for the new options (no validation, 2-level & 3-level tries, parallel). Requires code editing to switch between the two map options.

17 Nov 2016
23:34 PM

alex

Well it would be nice to have a comment edit option :D The second "2-Level Trie without Validation" should read "3-Level Trie without Validation".

18 Nov 2016
12:11 PM

Federico Lois

@alex, I like your approach. It is the kind of code I would write (nasty one :D)

18 Nov 2016
19:21 PM

alex

@Federico, thanks. I also tried other a few other trie designs for storage, but given the distribution of keys for this specific dataset 2-3 levels seemed to be optimal.

If you liked this approach, you might want to check the solution with awesome parallel processing bit magic I proposed for an earlier Etag parsing blog post challenge here and here.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Making code fasterI like my performance unsafely

More posts in "Making code faster" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Making code faster" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication