Challenge: The code review bug that gives me nightmares–The fix

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (647) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1093) rss
raven (1459) rss
ravendb.net (545) rss
reviews (184) rss

2025
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Couchbase vs RavenDB Performance at Rakuten Kobo Whitepaper

Nov 03 2021

ChallengeThe code review bug that gives me nightmares–The fix

time to read 2 min | 239 words

After presenting the issue of how to return items to the array pool without creating a use after free bug, I asked you how you would fix that. There are several ways to try to do that, you can use reference counting scheme, or try to use locking, etc. All of those options are expensive, what is worse, they are expensive on a routine basis, not just for the free the buffer code path.

Instead, I changed the way we are handling returning the buffer. Take a look at the following code:

This may require some explanation. I’m using a ConditionaWeakTable here, that was added to the runtime to enable dynamic properties on objects. Basically, it creates a table that you can lookup by an object to get a key. The most important feature is that the runtime ensures that the associated reference lifetime match the key object lifetime. In other words, when we add the buffer in the eviction callback, we ensure that the ReturnBuffer we register will live at least as long as the buffer.

That means that we can let the GC do the verification job. We’ll now return the buffer back to the pool only after the GC has ensured that there are no outstanding references to it. Not a lot of code, and an elegant solution. This also ensures that we are only paying the code on eviction (likely rare), and not all the time.

Tweet Share Share 11 comments

Tags:

Comments

03 Nov 2021
14:02 PM

Dalibor Čarapić

Hmm, I'm a bit baffled by this last post in this post series.
Although not explicitly stated I believe the original question was how to achieve caching of expensive calculation results with low memory impact (MemoryCache) and low allocation count (ArrayPool) without having additional bugs.
Maybe I was wrong in my expectation?
This post brings up a wrapper object (ReturnBuffer) which needs to be allocated in memory (so we go up in allocation) and this object will also get placed into finalizer queue by GC to ensure release of memory.
As the entire object that we are holding is actually just 32 bytes we could have just returned the copy of the original array and I assume that the performance (speed/memory) would be roughly equivalent or even in favor of the simple copy.
Frankly at this point I believe someone else wrote the post and not Oren:
- There seems to be no purpose to_joinLifetimes ('...that was added to the runtime to enable dynamic properties on objects. Basically, it creates a table that you can lookup by an object to get a key' --> I can not see Oren writing this)
- Caller which gets back the ReturnBuffer can change the content of the array
- Caller can also take direct reference to the byte[] from the ReturnBuffer.Buffer and keep it around longer than it is 'alive'
- Caller can swap the ReturnBuffer.Buffer with some other byte[] array

Maybe I missed something completely here :(

03 Nov 2021
19:33 PM

Beniamin

I don't think that this class is exposed to the caller. It looks like it is only used internally to avoid "use after free" bug. But to be honest I also think this looks like an overkill unless you know (from usage data) that this will be called a lot of times, so a lot of copies are avoided. But what about ArrayPool exhaustion that was mentioned on previous post?

05 Nov 2021
08:46 AM

Enzi

Have you considered not using ArrayPool at all in this scenario? I don't know how long the cache will keep the array, or how large they'll be, but it seems to me that renting a pooled array for a long, maybe unknown time could have negative effects on the ArrayPool. Won't it have to re-allocate new arrays if it runs out of rented ones? Wouldn't it be pointless then to return a rented array that has been held so long that the pool "forgot" about it?

Would it perhaps be acceptable to just allocate an array in this case, since it will be cached and re-used anyway? This code is definitely not simple anymore and there are many ways someone might introduce subtle bugs accidentally. If it is performance critical, then of course. If I remember correctly from the original article, this is "new" code though, so who knows if it's actually needed, or perhaps premature optimization.

06 Nov 2021
07:52 AM

Oren Eini

Enzi,

The pool doesn't remember the arrays that it rented out. There is no tracking of that at all. And the key use case here is to avoid overflowing the cache, which will cause us to hold the memory just long enough to get to Gen2, and then we basically significantly increase the memory costs with no benefits.

06 Nov 2021
09:43 AM

Oren Eini

Dalibor,

The ReturnBuffer object is allocated only on eviction, which would hopefully be rare. We also expect that object to be collected quite quickly, since we tie that to the lifetime of the buffer. We rely on the GC to tell us that there are no more references to the buffer by calling the ReturnBuffer finalizer.

This is basically a technique to add a finalizer to an object that you don't control. The caller never gets a ReturnBuffer, mind. We create an immediately discard it, letting the GC resolve the lifetime issue.

06 Nov 2021
15:52 PM

Cao

This is basically a technique to add a finalizer to an object that you don't control.

This is a very nice trick, so the solution is clearly interesting. However, there is now one allocation for every cached buffer, so it might be much simpler to just allocate the initial buffer. Also, if you know that the number of cached files will be very small, you can use a dedicated pool for the buffers, and never returns them.

We also expect that object to be collected quite quickly

Well, if you write "no-allocation" code, I suspect that your objects are not going to be collected :)

06 Nov 2021
23:21 PM

Enzi

The pool doesn't remember the arrays that it rented out. There is no tracking of that at all.

I meant it conceptually "forgot", which is why I put quotation marks around it, but that obviously wasn't clear. What I mean is, if we rent all arrays that a pool has to offer, it will have to allocate new ones when further requests are made. Assume a pool can hold n buffers and we rent n, then we rent another n which have to be allocated anew. Then we return the first batch of n buffers, then the second batch. Now the pool has 2*n buffers it could use or free, but the point is that the second batch had to be allocated and essentially the pool might as well not have existed.

If the problem as I have outlined it exists, isn't there a risk then that your cache could cause this scenario? By holding buffers obtained from a pool for such a long time (and assuming that you make heavy use of the pool, so that it runs the risk of running out of buffers), then you could cause the very thing you try to avoid: allocating large arrays, in random places (whatever requests a buffer from the pool once it is empty).

07 Nov 2021
12:00 PM

Oren Eini

Cao,

There is now an allocation for every evicted buffer, and that is a very short term allocation, usually.

That plays well to the GC strengths. In my real scenario, I cannot predict the number of the buffer or their sizes. Note that there is a big difference between a long term allocation, that reached Gen2 and one that is a short term one that is unlikely to survive the next collection.

07 Nov 2021
12:02 PM

Oren Eini

Enzi,

In practice, I don't expect this to be a problem. The system will settle down quickly to have the total number of allocated buffers.

Even if technically we need less than the total allocated, we are more concerned with allocation churn than actual allocation size. What usually happens is that we are either keeping the buffer for very long, or recycle it quickly, so it gets to go back to the pool.

08 Nov 2021
07:48 AM

Daniel

This solution depends on implementation details of the ArrayPool. If it holds on to a reference for the buffer this solution will leak memory as the ReturnBuffer is never collected. I also assume you did measure that sparing the allocation of 32 bytes is worth it, as Finalizers & WeakReferences have their own overhead for the GC.

08 Nov 2021
09:04 AM

Oren Eini

Daneil,

The documentation clearly states that not returning the buffer to the pool is fine, see: https://docs.microsoft.com/en-us/dotnet/api/system.buffers.arraypool-1.rent?view=net-5.0#System_Buffers_ArrayPool_1_Rent_System_Int32_

In other words, they are documented to not hold a reference, so this isn't an implementation issue. As for the actual scenario, the key here is that I'm only paying the price here iff the buffer is evicted. In most normal uses, I have no cost to pay.

That said, the scenario I had in mind for the actual usage makes use of much bigger buffers, where it is certainly worth it. Note that even for 32 bytes buffer, enough churn can cause issues, so reducing allocations in general is almost always a good thing.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

ChallengeThe code review bug that gives me nightmares–The fix

More posts in "Challenge" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Challenge" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication