Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q j

Posts: 6,666 | Comments: 48,512

filter by tags archive

Invisible race conditionsThe cache has poisoned us

time to read 2 min | 262 words

imageWe got a memory corruption error one of those days that was quite interesting. It was in a place where we previous fixed a memory corruption error and was, at a glance, quite impossible.

The code would checkout an item from the cache and increment its ref count, which will keep it alive for as long as we were using it. But something made it fail, and quite horribly, too. We finally tracked the code down to this piece of code, which is run when we update the cache:

When the ref count goes to zero, we’ll release the memory, and _items is a Concurrent Dictionary.

Do you see the error?

The AddOrUpdate method will call the updateValueFactory when it needs to update a value, but it makes no promises with regards to its atomicity. In other words, if you have two threads calling this method, the update lambda will be called twice with the same item, resulting in early release of the value and hence memory corruption.

This can be seen here:

As you can see, we are looking at a loop that may be executed several times, as such the updateValueFactory can be called several times, and the only guarantee we have is that after the method has returned, the last value we were called with was the value that was in the cache and we replaced.

Here is the fix:

That was quite hard to figure out, because at a glance, this looks just fine.

Invisible race conditionsThe sometimes failing script

time to read 1 min | 167 words

You probably know that Chrome is a memory hog, I came up with the following extremely brute force manner to deal with it:

This works quite nicely in 99% of the cases, but sometimes, this fails. Can you see why?

I’ll give you a hint, the EmptyWorkingSet() returns a failure and an invalid handle error . Why is that?

Well, the problem is a bit tricky. We first execute the Get-Process cmdlet, and extract the handles from the results. That is great, but we don’t keep track of the Process instances that we get from Get-Process, which means that they are garbage.

That means that the GC might clean them, but they require finalization, so at some point, the finalizer will claim them, closing their handles, which means that the EmptyWorkingSet will fail sporadically in a very non obvious way. The “fix”, by the way, is to iterate of the processes directly, not on their handles, because that keep the process instance live for the duration (and thus its handle).

Invisible race conditionsThe async query

time to read 1 min | 195 words

This issue was reported to the mailing list with a really scary error: UseAfterFree detected! Attempt to return memory from previous generation, Reset has already been called and the memory reused!

I initially read it as an error that is raised from the server, which raised up all sort of flags and caused us to immediately try to track down what is going on.

Here is the code that would reproduce this:


And a key part of that is that this is not happening on the server, but on the client. You now have all the information required to see what the error is.

Can you figure it out?

The problem is that this method returns a Task, but it isn’t an async method. In other words, we return a task that is still running from ToListAsync, but because we aren’t awaiting on it, the session’s dispose is going to run, and by the time the server request completes and is ready to actually do something with the data that it go, we are already disposed and we get this error.

The solution? Turn this into an async method and await on the ToListAsync() before disposing the session.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. RavenDB 4.1 features (11):
    04 Jul 2018 - This document is included in your subscription
  2. Codex KV (2):
    06 Jun 2018 - Properly generating the file
  3. I WILL have order (3):
    30 May 2018 - How Bleve sorts query results
  4. Inside RavenDB 4.0 (10):
    22 May 2018 - Book update
  5. RavenDB Security Report (5):
    06 Apr 2018 - Collision in Certificate Serial Numbers
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats