Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:


+972 52-548-6969

Posts: 7,462 | Comments: 50,985

Privacy Policy Terms
filter by tags archive
time to read 3 min | 479 words

At some point in any performance optimization sprint, you are going to run into a super annoying problem: The dictionary.

The reasoning is quite simple. One of the most powerful optimization techniques is to use a cache, which is usually implemented as a dictionary. Today’s tale is about a dictionary, but surprisingly enough, not about a cache.

Let’s set up the background, I’m looking at optimizing a big indexing batch deep inside RavenDB, and here is my current focus:


You can see that the RecordTermsForEntries take 4% of the overall indexing time. That is… a lot, as you can imagine.

What is more interesting here is why. The simplified version of the code looks like this:

Basically, we are registering, for each entry, all the terms that belong to it. This is complicated by the fact that we are doing the process in stages:

  1. Create the entries
  2. Process the terms for the entries
  3. Write the terms to persistent storage (giving them the recorded term id)
  4. Update the entries to record the term ids that they belong to

The part of the code that we are looking at now is the last one, where we already wrote the terms to persistent storage and we need to update the entries. This is needed so when we read them, we’ll be able to find the relevant terms.

At any rate, you can see that this method cost is absolutely dominated by the dictionary call. In fact, we are actually using an optimized method here to avoid doing a TryGetValue() and then Add() in case the value is not already in the dictionary.

If we actually look at the metrics, this is actually kind of awesome. We are calling the dictionary almost 400 million times and it is able to do the work in under 200 nanoseconds per call.

That is pretty awesome, but that still means that we have over 2% of our total indexing time spent doing lookups. Can we do better?

In this case, absolutely. Here is how this works, instead of doing a dictionary lookup, we are going to store a list. And the entry will record the index of the item in the list. Here is what this looks like:

There isn’t much to this process, I admit. I was lucky that in this case, we were able to reorder things in such a way that skipping the dictionary lookup is a viable method.

In other cases, we would need to record the index at the creation of the entry (effectively reserving the position) and then use that later.

And the result is…


That is pretty good, even if I say so myself. The cost went down from 3.6 microseconds per call to 1.3 microseconds. That is almost 3 folds improvement.

time to read 8 min | 1514 words

Today I ran into this Reddit post, detailing how Moq is now using SponsorLink to encourage users to sponsor the project.

The idea is that if you are using the project, you’ll sponsor it for some amount, which funds the project. You’ll also get something like this:

lots of thanks from ThisAssembly

This has been rolled out for some projects for quite some time, it seems. But Moq is a far more popular project and it got quite a bit of attention.

It is an interesting scenario, and I gave some thought to what this means.

I’m not a user of Moq, just to note.

I absolutely understand the desire to be paid for Open Source work. It takes a lot of time and effort and looking at the amount of usage people get out of your code compared to the compensation is sometimes ridiculous.

For myself, I can tell you that I made 800 USD out of Rhino.Mocks directly when it was one of the most popular mocking frameworks in the .NET world. That isn’t a sale, that is the total amount of compensation that I got for it directly.

I literally cannot total the number of hours that I spent on it. But OpenHub estimates it as 245 man-years. I… disagree with that estimate, but I certainly put a lot of time there.

From a commercial perspective, I think that this direction is a mistake. Primarily because of the economies of software purchases. You can read about the implementation of SponsorLink here. The model basically says that it will check whether the individual user has sponsored the project.

That is… not really how it works. Let’s say that a new developer is starting to work on an existing project. It is using a SponsorLink project. What happens then? That new developer is being asked to sponsor the project?

If this is a commercial project, I certainly support the notion that there should be some payment. But it should not be on the individual developer, it should be on the company that pays for the project.

That leaves aside all the scenarios where this is being used for an open source project, etc. Let’s ignore those for now.

The problem is that this isn’t how you actually get paid for software. If you are targeting commercial usage, you should be targeting companies, not individual users. More to the point, let’s say that a developer wants to pay, and their company will compensate them for that.

The process for actually doing that is atrocious beyond belief. There are tax implications (if they sponsor with 5$ / month and their employer gives them a 5$ raise, that would be taxed, for example), so you need to submit a receipt for expenses, etc.

A far better model would be to have a way to get the company to pay for that, maybe on a per project basis. Then you can detect if the project is sponsored, for example, by looking at the repository URL (and accounting for forks).

Note that at this point, we are talking about the actual process of getting money, nothing else about this issue.

Now, let’s get to the reason that this caused much angst for people. The way SponsorLink works is that it fetches your email from the git configuration file and check wether:

  • You are registered as a SponsorLink sponsor
  • You are sponsoring this particular project

It does both checks using what appears to be: base62(sha256(email));

If you are already a SponsorLink sponsor, you have explicitly agreed to sharing your email, so not a problem there. So the second request is perfectly fine.

The real problem is the first check, when you check if you are a SponsorLink sponsor in the first place. Let’s assume that you aren’t, what happens then.

Well, there is a request made that looks something like this:

HEAD /azure-blob-storage/path/app/3uVutV7zDlwv2rwBwfOmm2RXngIwJLPeTO0qHPZQuxyS

The server will return a 404 if you are not a sponsor at this point.

The email hash above is my own, by the way. As I mentioned, I’m not a sponsor, so I assume this will return 404. The question is what sort of information is being provided to the server in this case?

Well, there is the hashed email, right? Is that a privacy concern?

It is indeed. While reversing SHA256 in general is not possible, for something like emails, that is pretty trivial. It took me a few minutes to find an online tool that does just that.

The cost is around 0.00045 USD / email, just to give some context. So the end result is that using SponsorLink will provide the email of the user (without their express or implied consent) to the server. It takes a little bit of extra work, but it most certainly does.

Note that practically speaking, this looks like it hits Azure Blob Storage, not a dedicated endpoint. That means that you can probably define logging to check for the requests and then gather the information from there.  Not sure what you would do with this information, but it certainly looks like this falls under PII definition on the GDPR.

There are a few ways to resolve this problem. The first would be to not use email at all, but instead the project repository URL. That may require a bit more work to resolve forks, but would alleviate most of the concerns regarding privacy. A better option would be to just check for an included file in the repository, to be honest. Something like: .sponsored.projects file.

That would include the ids of the projects that were sponsored by this project, and then you can run a check to see that they are actually sponsored. There is no issue with consent here, since adding that file to the repository will explicitly consent for the process.

Assuming that you want / need to use the emails still, the problem is much more complex. You cannot use the same process as k-Anonymity as you can use for passwords. The problem is that a SHA256 of an email is as good as the email itself.

I think that something like this would work, however. Given the SHA256 of the email, you send to the server the following details:

  • prefix = SHA256(email)[0 .. 6]
  • key = read(“/dev/urandom”, 32bytes)
  • hash = HMAC-SHA256(key, SHA256(email)

The prefix is the first 6 letters of the SHA256 hash. The key has cryptography strength of 32 random bytes.

The hash is taking the SHA256 and hashing it again usung HMAC with the provided key.

The idea is that on the server side, you can load all the hashes that you stored that match the provided prefix. Then you compute the keyed HMAC for all of those values and attempt to check if there is a match.

We are trying to protect against a malicious server here, remember. So the idea is that if there is a match, we pinged the server with an email that it knows about. If we ping the server with an email that it does not know about, on the other hand, it cannot tell you anything about the value.

The first 6 characters of the SHA256 will tell you nothing about the value, after all. And the fact that we use a random key to sending the actual hash to the server means that there is no point trying to figure it out.  Unlike trying to guess an email, guessing a hash of an email is likely far harder, to the point that it is not feasible.

Note, I’m not a cryptography expert, and I wouldn’t actually implement such a thing without consulting with one. I’m just writing a blog post with my ideas.

That would at least alleviate the privacy concern. But there are other issues.

The SponsorLink is provided as a closed-source obfuscated library. People have taken the time to de-obfuscate it, and so far it appears to be matching the documented behavior. But the mere fact that it is actually obfuscated and closed-source inclusion in an open-source project raises a lot of red flags.

Finally, there is the actual behavior when it detects that you are not sponsoring this project. Here is what the blog post states will happen:

A diagnostics warning in VS suggesting you install SponsorLink

It will delay the build (locally, on your machine, not on CI).

That… is really bad. I assume that this happens on every build (not sure, though). If that is the case, that means that the feedback cycle of "write a test, run it, write code, run a test", is going to hit significant slowdowns.

I would consider this to be a breaking point even excluding everything else.

As I previously stated, I’m all for paying for Open Source software. But this is not the way to do that, there is a whole bunch of friction and not much that can indicate a positive outcome for the project.

Monetization strategies for Open Source projects are complex. Open core, for example, likely would not work for this scenario. Nor would you be likely to get support contracts. The critical aspect is that beyond just the technical details, any such strategy requires a whole bunch of infrastructure around it. Marketing, sales, contract negotiation, etc. There is no easy solution here, I’m afraid.

time to read 1 min | 83 words

I’m going to QCon San Francisco and will be teaching a full day workshop where we’ll start from a C compiler and  an empty file and end up with a functional storage engine, indexing and more.

Included in the minimum requirements are implementing transactions, MVCC, persistent data structures, and indexes.

The workshop is going to be loosely based on the book, but I’m going to condense things so we can cover this topic in a single day.

Looking forward to seeing you there.

time to read 9 min | 1711 words

In my previous post I discussed how we could store the exact same information in several ways, leading to space savings of 66%! That leads to interesting questions with regard to actually making use of this technique in the real world.

The reason I posted about this topic is that we just gained a very significant reduction in memory (and we obviously care about reducing resource usage). The question is whether this is something that you want to do in general.

Let’s look at that in detail. For this technique to be useful, you should be using structs in the first place. That is… not quite true, actually. Let’s take a look at the following declarations:

We define the same shape twice. Once as a class and once as a structure. How does this look in memory?


Here you can find some really interesting differences. The struct is smaller than the class, but the amount of wasted space is much higher in the struct. What is the reason for that?

The class needs to carry 16 bytes of metadata. That is the object header and the pointer to the method table. You can read more about the topic here. So the memory overhead for a class is 16 bytes at a minimum. But look at the rest of it.

You can see that the layout in memory of the fields is different in the class versus the structure. C# is free to re-order the fields to reduce the padding and get better memory utilization for classes, but I would need [StructLayout(LayoutKind.Auto)] to do the same for structures.

The difference between the two options can be quite high, as you can imagine. Note that automatically laying out the fields in this manner means that you’re effectively declaring that the memory layout is an implementation detail. This means that you cannot persist it, send it to native code, etc. Basically, the internal layout may change at any time.  Classes in C# are obviously not meant for you to poke into their internals, and LayoutKind.Auto comes with an explicit warning about its behavior.

Interestingly enough, [StructLayout] will work on classes, you can use to force LayoutKind.Sequential on a class. That is by design, because you may need to pass a part of your class to unmanaged code, so you have the ability to control memory explicitly. (Did I mention that I love C#?)

Going back to the original question, why would you want to go into this trouble? As we just saw, if you are using classes (which you are likely to default to), you already benefit from the automatic layout of fields in memory. If you are using structs, you can enable LayoutKind.Auto to get the same behavior.

This technique is for the 1% of the cases where that is not sufficient, when you can see that your memory usage is high and you can benefit greatly from manually doing something about it.

That leads to the follow-up question, if we go about implementing this, what is the overhead over time? If I want to add a new field to an optimized struct, I need to be able to understand how it is laid out in memory, etc.

Like any optimization, you need to maintain that. Here is a recent example from RavenDB.


In this case, we used to have an optimization that had a meaningful impact. The .NET code changed, and the optimization now no longer makes sense, so we reverted that to get even better perf.

At those levels, you don’t get to rest on your laurels. You have to keep checking your assumptions.

If you got to the point where you are manually optimizing memory layouts for better performance, there are two options:

  • You are doing that for fun, no meaningful impact on your system over time if this degrades.
  • There is an actual need for this, so you’ll need to invest the effort in regular maintenance.

You can make that easier by adding tests to verify those assumptions. For example, verifying the amount of padding in structs match expectation. A simple test that would verify the size of a struct would mean that any changes to that are explicit. You’ll need to modify the test as well, and presumably that is easier to catch / review / figure out than just adding a field and not noticing the impact.

In short, this isn’t a generally applicable pattern. This is a technique that is meant to be applied in case of true need, where you’ll happily accept the additional maintenance overhead for better performance and reduced resource usage.

time to read 35 min | 6841 words

Consider a warehouse that needs to keep track of items. For the purpose of discussion, we have quite a few fields that we need to keep track of. Here is how this looks like in code:

And the actual Warehouse class looks like this:

The idea is that this is simply a wrapper to the list of items. We use a struct to make sure that we have good locality, etc.

The question is, what is the cost of this? Let’s say that we have a million items in the warehouse. That would be over 137MB of memory. In fact, a single struct instance is going to consume a total of 144 bytes.

That is… a big struct, I have to admit. Using ObjectLayoutInspector I was able to get the details on what exactly is going on:

Type layout for 'WarehouseItem'
    Size: 144 bytes. Paddings: 62 bytes (%43 of empty space)

As you can see, there is a huge amount of wasted space here. Most of which is because of the nullability. That injects an additional byte, and padding and layout issues really explode the size of the struct.

Here is an alternative layout, which conveys the same information, much more compactly. The idea is that instead of having a full byte for each nullable field (with the impact on padding, etc), we’ll have a single bitmap for all nullable fields. Here is how this looks like:

If we look deeper into this, we’ll see that this saved a lot, the struct size is now 96 bytes in size. It’s a massive space-savings, but…

Type layout for 'WarehouseItem'
Size: 96 bytes. Paddings: 24 bytes (%25 of empty space)

We still have a lot of wasted space. This is because we haven’t organized the struct to eliminate padding. Let’s reorganize the structs fields to see what we can achieve. The only change I did was re-arrange the fields, and we have:

And the struct layout is now:


We have no wasted space, and we are 50% of the previous size.

We can actually do better, note that Fragile and IsHazarous are Booleans, and we have some free bits on _nullability that we can repurpose.

For that matter, RgbColor only needs 24 bits, not 32. Do we need alcohol content to be a float, or can we use a byte? If that is the case, can we shove both of them together into the same 4 bytes?

For dates, can we use DateOnly instead of DateTime? What about ShelfLife, can we measure that in hours and use a short for that (giving us a maximum of 7 years)?

After all of that, we end up with the following structure:

And with the following layout:


In other words, we are now packing everything into  48 bytes, which means that we are one-third of the initial cost. Still representing the same data. Our previous Warehouse class? It used to take 137MB for a million items, it would now take 45.7 MB only.

In RavenDB’s case, we had the following:

That is the backing store of the dictionary, and as you can see, it isn’t a nice one. Using similar techniques we are able to massively reduce the amount of storage that is required to process indexing.

Here is what this same scenario looks like now:

But we aren’t done yet , there is still more that we can do.

time to read 3 min | 417 words

A customer called us, quite upset, because their RavenDB cluster was failing every few minutes. That was weird, because they were running on our cloud offering, so we had full access to the metrics, and we saw absolutely no problem on our end.

During the call, it turned out that every now and then, but almost always immediately after a new deployment, RavenDB would fail some requests. On a fairly consistent basis, we could see two failures and a retry that was finally successful.

Okay, so at least there is no user visible impact, but this was still super strange to see. On the backend, we couldn’t see any reason why we would get those sort of errors.

Looking at the failure stack, we narrowed things down to an async operation that was invoked via DataDog. Our suspicions were focused on this being an error in the async machinery customization that DataDog uses for adding non-invasive monitoring.

We created a custom build for the user that they could test and waited to get the results from their environment. Trying to reproduce this locally using DataDog integration didn’t raise any flags.

The good thing was that we did find a smoking gun, a violation of the natural order and invariant breaking behavior.

The not so good news was that it was in our own code. At least that meant that we could fix this.

Let’s see if I can explain what is going on. The customer was using a custom configuration: FastestNode. This is used to find the nearest / least loaded node in the cluster and operate from it.

How does RavenDB know which is the fastest node? That is kind of hard to answer, after all. It checks.

Every now and then, RavenDB replicates a read request to all nodes in the cluster. Something like this:

The idea is that we send the request to all the nodes, and wait for the first one to arrive. Since this is the same request, all servers will do the same amount of work, and we’ll find the fastest node from our perspective.

Did you notice the cancellation token in there? When we return from this function, we cancel the existing requests. Here is what this looks like from the monitoring perspective:


This looks exactly like every few minutes, we have a couple of failures (and failover) in the system and was quite confusing until we figured out exactly what was going on.

time to read 3 min | 541 words

We got a support call from a client, in the early hours of the morning, they were getting out-of-memory errors from their database and were understandably perturbed by that. They are running on a cloud system, so the first inclination of the admin when seeing the problem was deploying the server on a bigger instance, to at least get things running while they investigate. Doubling and then quadrupling the amount of memory that the system has had no impact. A few minutes after the system booted, it would raise an error about running out of memory.

Except that it wasn’t actually running out of memory. A scenario like that, when we give more memory to the system and still have out-of-memory errors can indicate a leak or unbounded process of some kind. That wasn’t the case here. In all system configurations (including the original one), there was plenty of additional memory in the system. Something else was going on.

When our support engineer looked at the actual details of the problem, it was quite puzzling. It looked something like this:

System.OutOfMemoryException: ENOMEM on Failed to munmap at Sparrow.Server.Platform.Posix.Syscall.munmap(IntPtr start, UIntPtr length);

That error made absolutely no sense, as you can imagine. We are trying to release memory, not allocate it. Common sense says that you can’t really fail when you are freeing memory. After all, how can you run out of memory? I’m trying to give you some, damn it!

It turns out that this model is too simplistic. You can actually run out of memory when trying to release it. The issue is that it isn’t you that is running out of memory, but the kernel. Here we are talking specifically about the Linux kernel, and how it works.

Obviously a very important aspect of the job of the kernel is managing the system memory, and to do that, the kernel itself needs memory. For managing the system memory, the kernel uses something called VMA (virtual memory area). Each VMA has its own permissions and attributes. In general, you never need to be aware of this detail.

However, there are certain pathological cases, where you need to set up different permissions and behaviors on a lot of memory areas. In the case we ran into, RavenDB was using an encrypted database. When running on an encrypted database, RavenDB ensures that all plain text data is written to memory that is locked (cannot be stored on disk / swapped out).

A side effect of that is that this means that for every piece of memory that we lock, the kernel needs to create its own VMA. Since each of them is operated on independently of the others. The kernel is using VMAs to manage its own map of the memory. and eventually, the number of the items in the map exceeds the configured value.

In this case, the munmap call released a portion of the memory back, which means that the kernel needs to split the VMA to separate pieces. But the number of items is limited, this is controlled by the vm.max_map_count value.

The default is typically 65530, but database systems often require a lot more of those. The default value is conservative, mind.

Adjusting the configuration would alleviate this problem, since that will give us sufficient space to operate normally.

time to read 4 min | 631 words

On its face, we have a simple requirement:

  • Generate sequential numbers
  • Ensure that there can be no gaps
  • Do that in a distributed manner

Generating the next number in the sequence is literally as simple as ++, so surely that is a trivial task, right?

The problem is with the second requirement. The need to ensure that there are no gaps comes often when dealing with things like invoices. The tax authorities are really keen on “show me all your invoices”, and if there are gaps in the numbers, you may have to provide Good Answers.

You may think that the third one, running in a distributed environment, is the tough challenge, but that isn’t actually the case. If we are running in a single location, that is fairly easy. Run the invoice id generation as a transaction, and you are done. But the normal methods of doing that are usually wrong in edge cases.

Let’s assume that we use an Oracle database, which uses the following mechanism to generate the new invoice id:


Or SQL Server with an identity column:

CREATE TABLE invoices ( invoice_id INT IDENTITY(1,1) PRIMARY KEY, ... );

In both cases, we may insert a new value to the invoices table, consuming an invoice id. At some later point in time, we may roll back the transaction. Care to guess what happens then?

You have INVOICE #1000 and INVOICE #1002, but nothing in between. In fact, no way to even tell what happened, usually.

In other words, identity, sequence, serial, or autonumber – regardless of what database platform you use, are not suitable for generating gapless numbers.

The reasoning is quite simple. Assume that you have two concurrent transactions, which generate two new invoices at roughly the same time. You commit the later one before the first one, and roll back the first. You now have:

  • Invoice #1
  • Invoice #2
  • Invoice #1000
  • Invoice #1002

However, you don’t have Invoice #1001, and you cannot roll back the sequence value there, because if you do so, it will re-generate the #1002 on the next call.

Instead, for gapless numbers, we need to create this as a dedicated part of the transaction. So there would be a record in our system that contains the NextInvoiceId, which will be incremented as part of the new invoice creation.

In order to ensure that there are no gaps, you need to ensure that the NextInvoideId record increment is handled as a user operation, not a database operation. In other words, in SQL Server, that is a row in a table, that you manually increment as part of adding a new invoice. Here is what this will look like:

As you can see, we increment the row directly. So it will be rolledback as well.

The downside here is that we can no longer create two invoices concurrently. The second transaction would have to wait on the lock on the row in the next_gapless_ids table.

All of that happens inside a single database server. What happens when we are running in a distributed environment?

The answer in this case, is the exact same thing. You need a transaction, a distributed one, using a consensus algorithm. Here is how you can achieve this using RavenDB’s cluster wide transactions, which use the Raft protocol behind the scenes:

The idea is simple, we have a transaction that modifies the gapless ids document and creates a new invoice at the same time. We have to handle a concurrency exception if two transactions try to create a new invoice at the same time (because they both want to use the same invoice id value), but in essence this is pretty much exactly the same behavior as when you are running on a single node.

In other words, to ensure the right behavior, you need to use a transaction. And if you need a distributed transaction, that is just a flag away with RavenDB.


No future posts left, oh my!


  1. Challenge (72):
    19 Sep 2023 - Spot the bug
  2. Filtering negative numbers, fast (4):
    15 Sep 2023 - Beating memcpy()
  3. Recording (9):
    28 Aug 2023 - RavenDB and High Performance with Oren Eini
  4. Production postmortem (50):
    24 Jul 2023 - The dog ate my request
  5. Podcast (4):
    21 Jul 2023 - Hansleminutes - All the Performance with RavenDB's Oren Eini
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats