Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,640
|
Comments: 51,256
Privacy Policy · Terms
filter by tags archive
time to read 1 min | 137 words

We have a design issue with the RavenDB Query Language. Consider the following queries:

There are two different ways to express the same concept. The first version is what we have now, and it is modeled after SQL. The problem with that is that it makes it very hard to build good intellisense for this.

The second option is much easier, because the format of the query also follow the flow of writing, and by the time you are in the select, you already know what you are querying on. This is the reason why C# & VB.Net has their Linq syntax in this manner.

The problem is that it is pretty common to want to define aliases for fields in the select, and then refer to them afterward. That now become awkward in the second option.

Any thoughts?

time to read 3 min | 463 words

Transactions are important for a database, even if it feels strange to talk about it. Sometimes it feels like taking out this ad:

image

We pretty much treat RavenDB’s transactional nature as a baseline, same as the safe assumption that any employee we hire will have a pulse. (Sorry, we discriminate against Zombies and Vampires because they create hostile work environment, see here for details).

Back to transactions, and why I’m brining up a basic requirement like that. Consider the case when you need to pay someone. That operation is compose of two distinct operations. First the bank debit your account and then the bank credit the other account. You generally want these to happen as a transactional unit, either both of them happened, or none of them did. In practice, that isn’t how banks work at all, but that is the simplest way to explain transactions, so we’ll go with that.

With RavenDB, that is how it always worked. You can save multiple documents in a single transaction, and we guarantee that we are either able to save all of them, or none. There is a problem, however, when we start talking about distributed clusters. When you save data to RavenDB, that goes into a single node, and then it is replicated from there. In RavenDB 4.0 we are now also guaranteeing that replicating the data between nodes will also respect the transaction boundary, so overall in the cluster you can now rely that we’ll never break apart transactions.

Let us consider the following transfer:

  • Deduct 5$ from accounts/1234
  • Credit  5$ to accounts/4321

These changes will also be replicated as a single unit, so the total amount of money in the system remains the same.

But a more complex set of operation can be:

  • Deduct 5$ from accounts/1234
  • Credit  5$ to accounts/4321
  • Deduct 5$ from accounts/1234
  • Credit  5$ to accounts/5678

In this case, we have a document that is involved in multiple transactions. When we need to replicate to another node, we’ll replicate the current state and we do that in batches. So it is possible that we’ll replicate just:

  • Credit  5$ to accounts/4321

We don’t replicate accounts/1234 yet, because it has changed in a later transaction. That means that in one server, we suddenly have magically more money. While in general that would be a good thing, I’m told that certain parties aren’t in favor of such things, so we have another feature that interact with this. You can enable document revisions, in which case even if you documents are modified multiple times with different transactions, we’ll always send the transactions across the wire as they were saved.

This gives you transaction boundaries on a distributed database, and allow you to reason about your data in a saner fashion.


time to read 2 min | 233 words

During a performance problem investigation, we discovered that the following innocent looking code was killing our performance.

This is part of a code that allow users to subscribe to changes in the database using a WebSocket. This is pretty rare, so we check that there aren’t any and skip all the work.

We had a bunch of code that run on many threads and ended up calling this method. Since there are not subscribers, this should be very cheap, but it wasn’t. The problem was that the call to Count was locking, and that created a convoy that killed our performance.

We did some trawling through our code base and through the framework code and came up with the following conclusions.

ConcurrentQueue:

  • Clear locks
  • CopyTo, GetEnumerator, ToArray, Count creates a snapshot (consume more memory)
  • TryPeek, IsEmpty are cheap

Here is a piece of problematic code, we are trying to keep the last ~25 items that we looked at:

The problem is that this kind of code ensures that there will be a lot of snapshots and increases memory utilization.

ConcurrentDictionary:

  • Count, Clear, IsEmpty, ToArray  - lock the entire thing
  • TryAdd, TryUpdate, TryRemove – lock the bucket for this entry
  • GetEnumerator does not lock
  • Keys, Values both lock the table and forces a temporary collection

If you need to iterate over the keys of a concurrent dictionary, there are two options:

Iterating over the entire dictionary is much better then iterating over just the keys.

time to read 1 min | 155 words

When building a cache, I need a way to generate a hash code from a query. A query is a complex object that has many properties. My first attempt to do so looked like this:

And it failed, disastrously. All queries always had the same hash code, and I couldn’t figure out why until I actually debugged through the code.

Here is a hint, this only happened if WaitForNonStaleResultsTimeout is null. It might be better to also look at the way the compiler looks at this:

Basically, if WaitForNonStaleResultsTimeout is null, the entire expression evaluated to null, so we return zero, losing all the previous hash values that we put in.

The solution was to parenthesis the expression, but this is something that passed a couple rounds of scrutiny and didn’t show up at all, because it looked fine, and I don’t keep the precedence rules for operators in C# in my head.

time to read 2 min | 262 words

When running a full cluster on a single machine, you end up with a lot of console windows running instances of RavenDB, and it can be a bit hard to identify which is which.

We had the same issue when we have multiple tabs, it takes effort to map urls to the specific node, we solved this particular problem by making sure that this is very explicit in the browser.

image

And that turned out to be a great feature.

So I created an issue to also print the node id in the console, so we can quickly match a console window to the node id that we want. The task is literally to write something like:

Console.WriteLine(server.Cluster.NodeId);

And the only reason I created an issue for that is that I didn’t want to be side tracked by figuring out where to put this line.

Instead of a one line change, I got this:

image

Now, here is the difference between a drive by line of code and an actual resolution. Here is what this looks like:

image

And this handles nodes that haven’t been assigned and ID yet and color the nodes differently based on their topology so they can be easily be told apart. It also make the actual important information quite visible at a glance.

time to read 7 min | 1231 words

johnny-automatic-lame-fox-300pxThis isn’t actually about a production system, per se. This is about a performance regression we noticed in a scenario for 4.0. We recently started doing structured performance testing as part of our continuous integration system.

There goes the sad trombone sound, because what that told us is that at some point, we had a major performance regression. To give you that in numbers, we used to be around 50,000 writes per second (transactional, fully ACID, hit the disk, data is safe) and we dropped to around 35,000 writes per second. Fluctuations in performance tests are always annoying, but that big a drop couldn’t be explained by just random load pattern causing some variance in the results.

This was consistent and major difference between the previous record. Now, to explain things, the last firm numbers we had about this performance test was from Dec 2016. Part of the release process for 4.0 calls for a structured and repeatable performance testing, and this is exactly the kind of thing that it is meant to catch. So I’m happy we caught it, not so happy that we had to go through about 7 months of changes to figure out what exactly drove our performance down.

The really annoying thing is that we put a lot of effort into improving performance, and we couldn’t figure out what went wrong there. Luckily, we could scan the history and track our performance over time. I’m extremely over simplifying here, of course. In practice this meant running bisect on all the changes in the past 7 – 8  months and running benchmarks on each point. And performance is usually not something as cut and dried as a pass / fail test.

Eventually we found the commit, and it was small, self contained and had a huge impact on our performance. The code in question?

ThreadPool.SetMinThreads(MinThreads, MinThreads);

This change was part of ironically, a performance run that focused on increasing the responsiveness of the server under a specific load. More specifically, it handled the case where under a harsh load we would run into thread starvation. The problem with this load is that this specific load can only ever happen in our test suite. At some point we had run up to 80 parallel tests, each of which might spawn multiple servers. The problem is that we run them all a single process, so we run into a lot of sharing between the different tests.

That was actually quite intentional, because it allowed us to see how the code behaves in a bad environment. It exposed a lot of subtle issues, but eventually it got very tiring to have tests fail because there just weren’t enough threads to run them fast enough. So we set this value to a high enough number to not have to worry about this.

The problem is that we set it in the server, not in the tests. And that led to the performance issue. But the question is why would this setting cause such a big drop? Well, the actual difference in performance went from 20 μs to taking 28.5 μs, which isn’t that much when you think about it. The reason why this happened is a bit complex, and require understanding several different aspect of the scenario under test.

In this particular performance scenario, we are testing what happens when a lot of clients are writing to the database all at once. Basically, this is a 100% small writes scenario. Out aim here is to get sustained throughput and latency throughout the scenario. Given that we ensure that every write to RavenDB actually hit the disk and is transactional, that means that every single request needs to physically hit the disk. If we tried doing that for each individual request, that would lead to about 100 requests a second on a good disk, if that.

In order to avoid the cost of having to go to the physical disk, we do a lot of batching. Basically, we read the request from the network, do some basic processing and then send it to a queue where it will hit the disk along with all the other concurrent requests. We make heavy use of async to allow us to consume as little system resources as possible. This means that at any given point, we have a lot of requests that are in flight, waiting for the disk to acknowledge the write. This is pretty much the perfect system for high throughput server, because we don’t have to hold up a thread.

So why would increasing the amount of minimum threads hurt us so much?

To answer that we need to understand what it is exactly that SetMinThreads control. The .NET thread pool is a lazy one, even if you told it that the minimum number of threads is 1000, it will do nothing with this value. Instead, it will just accept work and distribute it among its threads. So far, so good. The problem starts when we have more work at hand then threads to process it. At this point, the thread pool need to make a decision, it can either wait for one of the existing threads to become available of it can create a new thread to  handle the new load.

This is when SetMinThreads come into play. Up until the point the thread pool created enough threads to exceed the min count, whenever there is more work in the pool then there are threads, a new thread will be created. If the minimum number of threads has been reached, the thread pool will wait a bit before adding new threads. This allow the existing threads time to finish whatever they were doing and take work from the thread pool. The idea is that given a fairly constant workload, the thread pool will eventually adjust to the right amount of threads to ensure that there isn’t any work that is sitting in the queue for too long.

With the SetMinThreads change, however, we increased the minimum number of threads significantly, and that caused the thread pool to create a lot more threads then it would otherwise create. Because our workload is perfect for the kind of tasks that the thread pool is meant to handle (short, do something and schedule more work then yield) it is actually very rare that the thread pool would need to create an additional thread, since a thread will be available soon enough.

This is not an accident, we very careful set out to build it in this fashion, of course.

The problem is that with the high number of minimum threads, it was very easy to get the thread pool into a situation where it had more work then threads. Even though we would have a free thread in just a moment, the thread pool would create a new thread at this point (because we told it to do so).

That lead to a lot of threads being created, and that lead to a lot of context switches and thread contentions as the threads are fighting over processing the incoming requests. In effect, that would take a significant amount of the actual compute time that we have for processing the request. By removing the SetMinThreads calls, we reduced the number of threads, and were able to get much higher performance.

time to read 2 min | 326 words

imageA customer called us with a problem, for the most part, RavenDB was very well behaved, but they were worried about the 99.99% percentile of request duration. While the 99.9% was excellent (around a few milliseconds), the 99.99% was measured in seconds.

Usually, this is a particularly slow request that happens, and we can handle that by figuring out what is slow on that particular request and fix it. Common causes for such an issue is a request that returns a lot of unnecessary data, such as large full documents, when it needs just a few fields.

In this case, however, our metrics told us that the problem was pretty widespread. There wasn’t a particular slow request, rather, at what appeared to be random times, certain requests would be slow. We also realized that it wasn’t a particular request that was slow, but all requests within a given time period.

That hinted quite strongly at the problem, it was very likely that this is a GC issue that caused a long pause. We still had to do some work to rule out other factors, such as I/O / noisy neighbors, but we narrowed down on GC as being the root cause.

The problem in this case was that the customer was running multiple databases on the same RavenDB process. And each of them was fairly large. The total amount of memory that the RavenDB process was using was around 60GB of managed memory. At that level, anything that would cause a GC can cause a significant pause and impact operations.

The solution in this case was to break that into multiple separate processes, one for each database. In this manner, we didn’t have a single managed heap that the GC had to traverse, each heap was much smaller, and the GC pause times were both greatly reduced and spaced much further apart.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. API Design (10):
    29 Jan 2026 - Don't try to guess
  2. Recording (20):
    05 Dec 2025 - Build AI that understands your business
  3. Webinar (8):
    16 Sep 2025 - Building AI Agents in RavenDB
  4. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  5. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
View all series

Syndication

Main feed ... ...
Comments feed   ... ...