Making code faster: Pulling out the profiler

architecture (616) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1088) rss
raven (1457) rss
ravendb.net (541) rss
reviews (184) rss

2025
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB Workshops - Deep dive into practical use of Document Data Modeling

Nov 18 2016

Making code fasterPulling out the profiler

time to read 2 min | 377 words

After doing all I can without reaching out to the profiler, and managing to get x45 performance gain, let us see what the profiler actually tells us. We’ll use the single threaded version, since that is easier.

Here it is:

We can see that dictionary operations take a lot of time, which is to be expected. But what is very surprising is that the date time calls are extremely expensive in this case.

The relevant code for those is here. You can see that it is pretty nice, but there are a bunch of things there that are likely costing us. The exception inside the method prevents in lining, there is error handling here that we don’t need, since we can safely assume in this exercise that the data is valid, etc.

So I changed the ParseTime to do this directly, like so:

And that saved us 11%, just this tiny change.

Here are our current costs:

Note that we reduced the cost of parse significantly ( at the cost of error handling, though ), but there are still a lot of work being done here. It turns out that we were actually measuring the time to write to the summary file as well (that is what all those FormatHelpers calls are), so that dirty the results somewhat, but nevermind.

The next place that we need to look at is the Dictionary, it is expensive, even though the usage of FastRecord means that we only need a single call per line, that isn’t so much fun. Note that it is using the GenericEqualityComparer, can we do better?

Trying to create my own equality comparer for longs doesn't really help.

So we’ll go back to the parallel version with the ParseTime optimization, and we are now running at 628 ms. And at this rate, I don’t think that there is a lot more room for improvements, so unless someone suggests something, we are done.

Tweet Share Share 16 comments

Tags:

Comments

18 Nov 2016
11:34 AM

Alex Davidson

People have already suggested using a simple array with IDs indexing into it, but that'd be ~400MB for ints and ~800MB for longs. Could we reduce the memory usage of the array, perhaps by sacrificing accuracy? Reducing the resolution from ticks to seconds lets us stuff 18 hours into an unsigned short, at the expense of rounding errors at the seconds level.

18 Nov 2016
12:15 PM

Federico Lois

@Alex that would depend on what the data is being used for. On some systems dealing with rounding errors of a second is nothing. But if what you are reading is logs, those ticks makes a huge difference. Having said that, you can use a different trick instead. You can still use a trick like that... Just take the lower 16 bits and encode the other 48 bits in a variable size format. There is ample chance that you can encode those remaining bits in far less bits than 48 (16 maybe?)

18 Nov 2016
16:44 PM

Bruno Martínez

Don't you need to consider leap seconds also?

18 Nov 2016
19:10 PM

alex

@Alex, as already remarked before - refer to the comments on the first post of this topic The interview question - the data to be used consists of an id key that has 8 digits in the alphabet '0' .. '9' and could be encoded as a max 27 bit uint and a duration value with a seconds resolution that can be encoded in a uint.

As the solution I posted (refer to comments under the I like my performance unsafely blog article) shows, it is actually possible for the given dataset of 199819 unique id entries to store this in a map that requires less than 199819 x (4 + 4) bytes = 1,598,552 bytes without compromising the seconds resolution. The "3-level trie" manages to store this data with only 864 kb of total allocations, i.e. a bit more than half of that size. It does so while being much faster than either a dictionary or flat 390 MB array indexed by id due to favorable memory cache effects.

@Bruno, theoretically yes. Note however that the reference .NET DateTime class also does not have any provisions for dealing with leap seconds, refer e.g. to this stackoverflow post.

18 Nov 2016
21:10 PM

Urs

I have coded my own solution, based upon yours, but replacing the dictionary with a home-grown object which bases on a long[][]. My solution takes only about 50 % of the time of yours, and only uses about 25% of the memory.

see here: https://github.com/ursmeili/ConsoleApplication2/blob/master/ConsoleApplication2/Program.cs

The class in question is called "FastRecordCollection"

sorry there are no comments, but I had to finish it quickly before my wife came back from choir rehearsal :-)

do I pass the job interview?

18 Nov 2016
21:49 PM

corey lawson

there's much to absorb here. I'd like to apply similar stuff to Powershell, too, but hopefully w/o having to write any C# stuff , which sort of defeats the point. Why? Well, like in Unixland, using things like Awk, Perl, etc. to do this kind of stuff is usually Just Good Enough, some of us want to use Powershell for the same purpose...

Thanks for the good reading, Ayende!

19 Nov 2016
01:13 AM

JustPassingBy

Can't resist trying again. Checked it against the ten-line test this time round and made sure to eliminate the unsafe array access.

Took: 558 ms
Allocated: 962,413 kb
Peak Working Set: 400,268 kb

Code: https://gist.github.com/anonymous/d959662b4df4efd7ffea801fbec31879

19 Nov 2016
13:27 PM

Alois Kraus

@Urs: Nice one. Initialliy I thought you would produce hash collisions with var i = (id & (SLOTSIZE - 1)); but your slot design via var slot = id >> RAISE_BY; nicely circumvents this. Is this trick somewhere documented?

19 Nov 2016
15:44 PM

Urs

@Alois, no it is nowhere documented as far as I know, but I use this design in my on programs now and then, because it allows for lightning fast collision detections when you have a vast array of datetime intervals to process.

19 Nov 2016
16:59 PM

Uri

Hi Ayende,

have you tried to create Int64 Equity Comparer because gethashcode doesn't support longs? whats the motivation? maybe two nested dictionary's can help

just for comparison, I'm curious how long it takes on your machine @Arseny Kapoulkine c++ solution? https://gist.github.com/zeux/90a49b85c8cfdf04ffa5489ec8916271

19 Nov 2016
20:36 PM

alex

@Uri The c++ solution by @Arseny Kapoulkine (with some corrections to get a more accurate estimate of actual memory allocations) gives the following numbers on my system, vs. the fastest option of my c# based solution. Notes: "Parallel" on my system means 8 threads, "Validation" means whether or not input validity checking is performed, "Reference" is the initial "linqy" blog post solution provided in the zip archive together with the input data file.

Solution            Threading       Validation      Run-time (ms)   Allocations (kb)
Reference           Single          Y               42,517          4,585,958
Arseny c++          Parallel        Y               113             81,667
2-Level trie c#     Parallel        Y               148             7,496
Arseny c++          Single          Y               192             1,578
2-Level trie c#     Single          Y               239             960
Arseny c++          Parallel        N               105             81,667
2-Level trie c#     Parallel        N               130             7,496
Arseny c++          Single          N               182             1,578
2-Level trie c#     Single          N               212             960

20 Nov 2016
08:33 AM

Uri

thank you @alex, this is quite impressive, C# fastest option is only 20% less than the most optimized c++. in single thread the difference is even lower. there is always the option to include external c dll to do the job, but generally, c# has good performance.

20 Nov 2016
18:41 PM

JustPassingBy

Writing to disk is now my second most expensive operation (costs me about 100ms). I can't figure out how to make that any quicker. But I did manage to shave off another 100ms in other places:-

Took: 471 ms
Allocated: 936,537 kb
Peak Working Set: 407,832 kb

Code: https://gist.github.com/anonymous/7d19e14c8f223c08142f4bab808deda7

21 Nov 2016
06:58 AM

Oren Eini

JustPassingBy, Try writing a dedicated function to it, you are wasting a lot of time there doing allocations, string processing, etc.

21 Nov 2016
20:00 PM

Joakim

"[...] unless someone suggests something, we are done"

Why not use what we know about the file? Entry- and exitlog from a parkinglot should mean the datepart often are equal. And since we only care about the duration, in those majority of cases we could get away by just parsing the timeparts.

With a simple sequence equality check and a simple timeparser the execution time could be cut in half.

22 Nov 2016
13:08 PM

Oren Eini

Joakin, Yes, that is what we ended up doing. Wait for the next few posts

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

Making code fasterPulling out the profiler

More posts in "Making code faster" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Making code faster" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication