Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 262 words

imageWe got a memory corruption error one of those days that was quite interesting. It was in a place where we previous fixed a memory corruption error and was, at a glance, quite impossible.

The code would checkout an item from the cache and increment its ref count, which will keep it alive for as long as we were using it. But something made it fail, and quite horribly, too. We finally tracked the code down to this piece of code, which is run when we update the cache:

When the ref count goes to zero, we’ll release the memory, and _items is a Concurrent Dictionary.

Do you see the error?

The AddOrUpdate method will call the updateValueFactory when it needs to update a value, but it makes no promises with regards to its atomicity. In other words, if you have two threads calling this method, the update lambda will be called twice with the same item, resulting in early release of the value and hence memory corruption.

This can be seen here:

As you can see, we are looking at a loop that may be executed several times, as such the updateValueFactory can be called several times, and the only guarantee we have is that after the method has returned, the last value we were called with was the value that was in the cache and we replaced.

Here is the fix:

That was quite hard to figure out, because at a glance, this looks just fine.

time to read 1 min | 167 words

You probably know that Chrome is a memory hog, I came up with the following extremely brute force manner to deal with it:

This works quite nicely in 99% of the cases, but sometimes, this fails. Can you see why?

I’ll give you a hint, the EmptyWorkingSet() returns a failure and an invalid handle error . Why is that?

Well, the problem is a bit tricky. We first execute the Get-Process cmdlet, and extract the handles from the results. That is great, but we don’t keep track of the Process instances that we get from Get-Process, which means that they are garbage.

That means that the GC might clean them, but they require finalization, so at some point, the finalizer will claim them, closing their handles, which means that the EmptyWorkingSet will fail sporadically in a very non obvious way. The “fix”, by the way, is to iterate of the processes directly, not on their handles, because that keep the process instance live for the duration (and thus its handle).

time to read 1 min | 195 words

This issue was reported to the mailing list with a really scary error: UseAfterFree detected! Attempt to return memory from previous generation, Reset has already been called and the memory reused!

I initially read it as an error that is raised from the server, which raised up all sort of flags and caused us to immediately try to track down what is going on.

Here is the code that would reproduce this:


And a key part of that is that this is not happening on the server, but on the client. You now have all the information required to see what the error is.

Can you figure it out?

The problem is that this method returns a Task, but it isn’t an async method. In other words, we return a task that is still running from ToListAsync, but because we aren’t awaiting on it, the session’s dispose is going to run, and by the time the server request completes and is ready to actually do something with the data that it go, we are already disposed and we get this error.

The solution? Turn this into an async method and await on the ToListAsync() before disposing the session.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}