Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:

oren@ravendb.net

+972 52-548-6969

Posts: 7,274 | Comments: 50,512

Privacy Policy Terms
filter by tags archive
time to read 1 min | 144 words

I asked about a slow piece of code and why it is so slow. The answer is pretty simple, I messed up, take a look at this piece of code:

image

When the Value is an int, I’m creating a span from the address of a long variable (initialized to zero). That means that I have a lot of hash collisions, which means that adding to the dictionary turned into a sequential operation, which means that the actual cost here is O(N**2).

Interestingly enough, you can’t write this code without the use of unsafe. I actually didn’t know that the scope of the variable was even valid here to have its address taken. That was very hard to debug, because the problem was hidden away somewhere very far from where we looked at.

time to read 1 min | 81 words

The following code takes a long time to run. In fact, I’m writing this blog post while this is running, and I’m not sure how long that will take.

Update: This took over a minute to complete on my machine (which is pretty nice).

The killer is that this code is correct, it does the right thing, but it can be slow. I stripped a much larger scenario to ~50 lines of code, can you see what is going on? And why?

time to read 2 min | 239 words

After presenting the issue of how to return items to the array pool without creating a use after free bug, I asked you how you would fix that. There are several ways to try to do that, you can use reference counting scheme, or try to use locking, etc. All of those options are expensive, what is worse, they are expensive on a routine basis, not just for the free the buffer code path.

Instead, I changed the way we are handling returning the buffer. Take a look at the following code:

This may require some explanation. I’m using a ConditionaWeakTable here, that was added to the runtime to enable dynamic properties on objects. Basically, it creates a table that you can lookup by an object to get a key. The most important feature is that the runtime ensures that the associated reference lifetime match the key object lifetime. In other words, when we add the buffer in the eviction callback, we ensure that the ReturnBuffer we register will live at least as long as the buffer.

That means that we can let the GC do the verification job. We’ll now return the buffer back to the pool only after the GC has ensured that there are no outstanding references to it. Not a lot of code, and an elegant solution. This also ensures that we are only paying the code on eviction (likely rare), and not all the time.

time to read 2 min | 301 words

In my previous post, I discussed a bug that brought up in code review, that bug made me go into near panic mode. Here is the issue:

In order to understand this bug, you have to take into account multiple concurrent threads at the same time. Look at the ComputeHashAndPutInCache() method, where we register an eviction callback for the item in the cache. When we evict the item, we return the buffer to the buffer pool.

We want to avoid allocating memory, so that is certainly something desirable, no? However, consider what happens if I have a thread in ComputeHash(), getting a value from the cache. Before I have a chance to look at the value, however, the cache will decide to evict it. At which point the eviction callback will run.

We’ll return the buffer back to the buffer pool, where it may be used again by something else. I am also using this buffer to do other things from the caller of the ComputeHash() call. This is a somewhat convoluted use after free issue, basically.

And I find is super scary bug because of its affects:

  • Randomly, and rarely, some buffer will contain the wrong data, leading to wrong results (but hard to track it down).
  • Trying to find such a bug after the fact, however, is nearly impossible.
  • Most of the debugging techniques (repeating the operation for a particular value) will make it go away (the cache will keep the value and not evict it).

In short, this is a recipe for a really nasty debug session and an impossible to resolve bug. All from code that looks very much like an innocent bystander.

Now, I can obviously fix it by not using the array pool, but that may cause me to allocate more memory than I should. How would you approach fixing this issue?

time to read 1 min | 148 words

I was writing a fairly low level piece of code recently, this is deep in the guts of how RavenDB handles query. I’m adding a cache for some processing inside of RavenDB and the performance benefits are amazing (three orders of magnitude better). As you can imagine, this is the sort of things that I would really like to get into the main codebase. So I did all the usual taxes, created a pull request and moved to other things. Part of our process is to review all pull requests by another pair of eyes. And while there was some minor comments about the code, there was one comment asking about a particular line that had be break out in cold sweat.

I created a simpler reproduction for discussion purposes, here is the code:

Take a look, and let’s me know if you see the bug and its consequences.

time to read 2 min | 286 words

I’m teaching a course at university about cloud computing. That can be a lot of fun, but quite frustrating at time. The key issue for me is that I occasionally need to provide students with some way to do something that I know how to do properly, but I can’t.

Case in point, assuming that I have a distributed cluster of nodes, and we need to detect what nodes are up or down, how do you do that?

With RavenDB, we assign an observer to the cluster whose job is to do health monitoring. I can explain that to the students, but I can’t expect them to utilize this technique in their exercises, there is too much detail there. The focus of the lesson or exercise is not to build a distributed system but to make use of one, after all.

As a rule, I try to ensure that all projects that we are working on can be done in under 200 lines of Python code. That puts a hard limit to the amount of behavior I can express. Because of that, I find myself looking for ways to rely on existing infrastructure to deal with the situation. 

Each node is running the same code, and they are setup so they can talk to one another, if needed. It is important that all the live nodes will converge to agree on the active nodes in relatively short order.

The task is to find the list of active nodes in a cluster, where nodes may go up or down dynamically. We are running in AWS cloud so you can use its resources, how would you do that?

The situation should be as simple as possible and easy to explain to students.

time to read 2 min | 371 words

I posed a potential problem for a job interview. Given the following function, generate another key that has the same shard:

In other words, give “users/71” and a prefix of “orders/654”, generate a key that would be placed on the same shard as “users/71”. The answer in this case can be: “orders/654-vaueaa”.

In order to answer the question, we need to understand what is going on here. The function above is a fancy way to extract 16 bits of information from the key using a cryptographically hash function. MD5 is no longer considered secured, but given the fact that I’m needing just 16 bits, that is not an issue. The code above is slightly more complex than needed, I could simplify it to this and have the same effect (but not the same result, mind):

The need to generate a matching shard id is another way to say that we need a hash collision. Given that the key space is 2^16, and that we can assume that any mutation to the key will result effectively random changes to the result, we can simply generate different keys and try to see if they match. Here is a simple way to do so:

We are effectively throwing a dice and seeing if this match. So it is a probability game to wait until we have a collision. The actual implementation isn’t that important, what is interesting is to talk about the implications here:

  • Are there better ways to go about doing something like this? Not really, given that MD5 isn’t that broken.
  • How much time will it take to generate a shard id match? The answer, usually around 64K tries. But why is interesting. The birthday attack issues don’t play here, because we don’t need to match to multiple items, just one. So we role the dice and see if we match on the value.
  • Can we speed this up? Using a different hash function would probably help, yes.
  • What other ways do we have to handle this? Different shard id generation would allow much better alternative.

The last question is where we get into more interesting details about system design, ergonomics of the choices we make and get to see how the candidate actually thinks.

time to read 1 min | 118 words

Here is something that came up that I liked as task to handle in a job interview. We have the following function, generating a shard id from a document key:

There is a need to create a new document that must reside on the same shard as another document. The task is to write a function that would generate that id.

For example:

  • Key to match: users/71
  • Document id to modify: orders/654

The result can be: orders/654-22261 or orders/654-vaueaa

In other words, both orders/654-vaueaa and users/71 will have the same shard.

I like this because it is very easy to explain and has simple coding requirements. But it allows to go deep into several different fields of knowledge to understand the candidate’s solution.

time to read 2 min | 328 words

In my previous post, I asked about the following code and what its output will be:

As it turns out, this code will output two different numbers:

  • On Debug – 134,284,904
  • On Release – 66,896

The behavior is consistent between these two modes.

I was pretty sure that I knew what was going on, but I asked to verify. You can read the GitHub issue if you want the spoiler.

I attached to the running program in WinDBG and issued the following command:

We care about the last line. In particular, we can see that all the memory is indeed in the byte array, as expected.

Next, let’s dump the actual instances that take so much space:

There is one large instance here that we care about, let’s see what is holding on to this fellow, shall we?

It looks like we have a reference from a local variable. Let’s see if we can verify that, shall we? We will use the clrstack command and ask it to give us the parameters and local variables, like so:

The interesting line is 16, which shows:

image

In other words, here is the local variable, and it is set to null. What is going on? And why don’t we see the same behavior on release mode?

As mentioned in the issue, the problem is that the JIT introduce a temporary local variable here, which the GC is obviously aware of, but WinDBG is not. This cause the program to hold on to the value for a longer period of time than expected.

In general, this should only be a problem if you have a long running loop. In fact, we do in some case, and in debug mode, that actually caused our memory utilization to go through the roof and led to this investigation.

In release mode, these temporary variables are rarer (but can still happen, it seems).

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Implementing a file pager in Zig (14):
    24 Jan 2022 - Pages, buffers and metadata, oh my!
  2. re (30):
    14 Jan 2022 - Are You Sure You Want to Use MMAP in Your Database Management System?
  3. Production postmortem (33):
    03 Jan 2022 - An error on the first act will lead to data corruption on the second act…
  4. Negative feature response (2):
    20 Dec 2021 - Protect the user from accidental collection deletion
  5. Challenge (63):
    16 Dec 2021 - Find the slow down–answer
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats