Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,575
|
Comments: 51,188
Privacy Policy · Terms
filter by tags archive
time to read 8 min | 1446 words

We recently tackled performance improvements for vector search in RavenDB. The core challenge was identifying performance bottlenecks. Details of specific changes are covered in a separate post.

        TODO: link to the blog post on that

In this post, I don’t want to talk about the actual changes we made, but the approach we took to figure out where the cost is.

Take a look at the following flame graph, showing where our code is spending the most time.

As you can see, almost the entire time is spent computing cosine similarity. That would be the best target for optimization, right?

I spent a bunch of time writing increasingly complicated ways to optimize the cosine similarity function. And it worked, I was able to reduce the cost by about 1.5%!

That is something that we would generally celebrate, but it was far from where we wanted to go. The problem was elsewhere, but we couldn’t see it in the profiler output because the cost was spread around too much.

Our first task was to restructure the code so we could actually see where the costs were. For instance, loading the vectors from disk was embedded within the algorithm. By extracting and isolating this process, we could accurately profile and measure its performance impact.

This restructuring also eliminated the "death by a thousand cuts" issue, making hotspots evident in profiling results. With clear targets identified, we can now focus optimization efforts effectively.

That major refactoring had two primary goals. The first was to actually extract the costs into highly visible locations, the second had to do with how you address them. Here is a small example that scans a social graph for friends, assuming the data is in a file.


def read_user_friends(file, user_id: int) -> List[int]:
    """Read friends for a single user ID starting at indexed offset."""
    pass # redacted


def social_graph(user_id: int, max_depth: int) -> Set[int]:
    if max_depth < 1:
        return set()
    
    all_friends = set() 
    visited = {user_id} 
    work_list = deque([(user_id, max_depth)])  
    
    with open("relations.dat", "rb") as file:
        while work_list:
            curr_id, depth = work_list.popleft()
            if depth <= 0:
                continue
                
            for friend_id in read_user_friends(file, curr_id):
                if friend_id not in visited:
                    all_friends.add(friend_id)
                    visited.add(friend_id)
                    work_list.append((friend_id, depth - 1))
    
    return all_friends

If you consider this code, you can likely see that there is an expensive part of it, reading from the file. But the way the code is structured, there isn’t really much that you can do about it. Let’s refactor the code a bit to expose the actual costs.


def social_graph(user_id: int, max_depth: int) -> Set[int]:
    if max_depth < 1:
        return set()
    
    all_friends = set() 
    visited = {user_id} 
    
    with open("relations.dat", "rb") as file:
        work_list =  read_user_friends(file, [user_id])
        while work_list and max_depth >= 0:
            to_scan = set()
            for friend_id in work_list: # gather all the items to read
                if friend_id in visited:
                    continue
                
                all_friends.add(friend_id)
                visited.add(friend_id)
                to_scan.add(curr_id)


            # read them all in one shot
            work_list = read_users_friends(file, to_scan)
            # reduce for next call
            max_depth = max_depth - 1    
    
    return all_friends

Now, instead of scattering the reads whenever we process an item, we gather them all and then send a list of items to read all at once. The costs are far clearer in this model, and more importantly, we have a chance to actually do something about it.

Optimizing a lot of calls to read_user_friends(file, user_id) is really hard, but optimizing read_users_friends(file, users) is a far simpler task. Note that the actual costs didn’t change because of this refactoring, but the ability to actually make the change is now far easier.

Going back to the flame graph above, the actual cost profile differs dramatically as the size of the data rose, even if the profiler output remained the same. Refactoring the code allowed us to see where the costs were and address them effectively.

Here is the end result as a flame graph. You can clearly see the preload section that takes a significant portion of the time. The key here is that the change allowed us to address this cost directly and in an optimal manner.

The end result for our benchmark was:

  • Before: 3 minutes, 6 seconds
  • After: 2 minutes, 4 seconds

So almost exactly ⅓ of the cost was removed because of the different approach we took, which is quite nice.

This technique, refactoring the code to make the costs obvious, is a really powerful one. Mostly because it is likely the first step to take anyway in many performance optimizations (batching, concurrency, etc.).

time to read 1 min | 161 words

Join Our Community Discussion: Exploring the Power of AI Search in Modern Applications

We're excited to announce our second Community Open Discussion, focusing on a transformative feature in today's applications: AI search.This technology is rapidly becoming the new standard for delivering intelligent and intuitive search experiences.

Join Dejan from our DevRel team for an open and engaging discussion.Whether you're eager to learn, contribute your insights, or simply listen in, everyone is welcome!

We’ll talk about:

  • The growing popularity and importance of AI search.
  • A deep dive into the technical aspects, including embeddings generation, query term caching, and quantization techniques.
  • An open forum to discuss best practices and various approaches to implementing AI search.
  • A live showcase demonstrating how RavenDB AI Integration allows you to implement AI Search in just 5 minutes, with the same simplicity as our regular search API.

Event Details:

Bring your questions and your enthusiasm – we look forward to seeing you there!

time to read 1 min | 187 words

Orleans is a distributed computing framework for .NET. It allows you to build distributed systems with ease, taking upon itself all the state management, persistence, distribution, and concurrency.

The core aspect in Orleans is the notion of a “grain” - a lightweight unit of computation & state. You can read more about it in Microsoft’s documentation, but I assume that if you are reading this post, you are already at least somewhat familiar with it.

We now support using RavenDB as the backing store for grain persistence, reminders, and clustering. You can read the official announcement about the release here, and the docs covering how to use RavenDB & Microsoft Orleans.

You can use RavenDB to persist and retrieve Orleans grain states, store Orleans timers and reminders, as well as manage Orleans cluster membership.

RavenDB is well suited for this task because of its asynchronous nature, schema-less design, and the ability to automatically adjust itself to different loads on the fly.

If you are using Orleans, or even just considering it, give it a spin with RavenDB. We would love your feedback.

time to read 3 min | 440 words

RavenDB is moving at quite a pace, and there is actually more stuff happening than I can find the time to talk about. I usually talk about the big-ticket items, but today I wanted to discuss some of what we like to call Quality of Life features.

The sort of things that help smooth the entire process of using RavenDB - the difference between something that works and something polished. That is something I truly care about, so with a great sense of pride, let me walk you through some of the nicest things that you probably wouldn’t even notice that we are doing for you.


RavenDB Node.js Client - v7.0 released (with Vector Search)

We updated the RavenDB Node.js client to version 7.0, with the biggest item being explicit support for vector search queries from Node.js. You can now write queries like these:


const docs = session.query<Product>({collection: "Products"})
   .vectorSearch(x => x.withText("Name"),
      factory => factory.byText("italian food"))
  .all();

This is the famous example of using RavenDB’s vector search to find pizza and pasta in your product catalog, utilizing vector search and automatic data embeddings.


Converting automatic indexes to static indexes

RavenDB has auto indexes. Send a query, and if there is no existing index to run the query, the query optimizer will generate one for you. That works quite amazingly well, but sometimes you want to use this automatic index as the basis for a static (user-defined) index. Now you can do that directly from the RavenDB Studio, like so:

You can read the full details of the feature at the following link.


RavenDB Cloud - Incidents History & Operational Suggestions

We now expose the operational suggestions to you on the dashboard. The idea is that you can easily and proactively check the status of your instances and whether you need to take any action.

You can also see what happened to your system in the past, including things that RavenDB’s system automatically recovered from without you needing to lift a finger.

For example, take a look at this highly distressed system:


As usual, I would appreciate any feedback you have on the new features.

time to read 1 min | 92 words

Last week I did an hour long webinar showing AI integration in RavenDB. From vector search to RAG, from embedding generation to Gen AI inside of the database engine.

Most of those features are already released, but I would really love your feedback on the Gen AI integration story (starts at around to 30 minutes mark in the video).

Let me know what you think!

time to read 2 min | 342 words

I wrote the following code:


if (_items is [var single])
{
    // no point invoking thread pool
    single.Run();
}

And I was very proud of myself for writing such pretty and succinct C# code.

Then I got a runtime error:

I asked Grok about this because I did not expect this, and got the following reply:

No, if (_items is [var single]) in C# does not match a null value. This pattern checks if _items is a single-element array and binds the element to single. If _items is null, the pattern match fails, and the condition evaluates to false.

However, the output clearly disagreed with both Grok’s and my expectations. I decided to put that into SharpLab, which can quickly help identify what is going on behind the scenes for such syntax.

You can see three versions of this check in the associated link.


if(strs is [var s]) // no null check


if(strs is [string s]) //  if (s != null)


if(strs is [{} s]) //  if (s != null)

Turns out that there is a distinction between a var pattern (allows null) and a non-var pattern. The third option is the non-null pattern, which does the same thing (but doesn’t require redundant type specification). Usually var vs. type is a readability distinction, but here we have a real difference in behavior.

Note that when I asked the LLM about it, I got the wrong answer. Luckily, I could get a verified answer by just checking the compiler output, and only then head out to the C# spec to see if this is a compiler bug or just a misunderstanding.

time to read 3 min | 420 words

I was just reviewing a video we're about to publish, and I noticed something in the subtitles. It said, "Six qubits are used for..."

I got all excited thinking RavenDB was jumping into quantum computing. But nope, it turned out to be a transcription error. What was actually said was, "Six kilobytes are used for..."

To be fair, I listened to the recording a few times, and honestly, "qubits" isn't an unreasonable interpretation if you're just going by the spoken words. Even with context, that transcription isn't completely out there. I wouldn't be surprised if a human transcriber came up with the same result.

Fixing this issue (and going over an hour of text transcription to catch other possible errors) is going to be pretty expensive. Honestly, it would be easier to just skip the subtitles altogether in that case.

Here's the thing, though. I think a big part of this is that we now expect transcription to be done by a machine, and we don't expect it to be perfect. Before, when it was all done manually, it cost so much that it was reasonable to expect near-perfection.

What AI has done is make it cheap enough to get most of the value, while also lowering the expectation that it has to be flawless.

So, the choices we're looking at are:

  • AI transcription - mostly accurate, cheap, and easy to do.
  • Human transcription - highly accurate, expensive, and slow.
  • No transcription - users who want subtitles would need to use their own automatic transcription (which would probably be lower quality than what we use).

Before, we really only had two options: human transcription or nothing at all. What I think the spread of AI has done is not just made it possible to do it automatically and cheaply, but also made it acceptable that this "Good Enough" solution is actually, well, good enough.

Viewers know it's a machine translation, and they're more forgiving if there are some mistakes. That makes it way more practical to actually use it. And the end result? We can offer more content.

Sure, it's not as good as manual transcription, but it's definitely better than having no transcription at all (which is really the only other option).

What I find most interesting is that it's the fact that this is so common now that makes it possible to actually use it more.

Yes, we actually review the subtitles and fix any obvious mistakes for the video. The key here is that we can spend very little time actually doing that, since errors are more tolerated.

time to read 2 min | 218 words

RavenDB is a pretty big system, with well over 1 million lines of code. Recently, I had to deal with an interesting problem. I had a CancellationToken at hand, which I expected to remain valid for the duration of the full operation.

However, something sneaky was going on there. Something was cancelling my CancelationToken, and not in an expected manner. At last count, I had roughly 2 bazillion CancelationTokens in the RavenDB codebase. Per request, per database, global to the server process, time-based, operation-based, etc., etc.

Figuring out why the CancelationToken was canceled turned out to be a chore. Instead of reading through the code, I cheated.


token.Register(() =>
{
    Console.WriteLine("Cancelled!" + Environment.StackTrace);
});

I ran the code, tracked back exactly who was calling cancel, and realized that I had mixed the request-based token with the database-level token. A single line fix in the end. Until I knew where it was, it was very challenging to figure it out.

This approach, making the code tell you what is wrong, is an awesome way to cut down debugging time by a lot.

time to read 1 min | 146 words

Cloud service costs can often be confusing and unpredictable.RavenDB Cloud's new feature addresses this by providing real-time cost predictions whenever you make changes to your system. This transparency allows you to make informed choices about your cluster and easily incorporate cost considerations into your decision loop to take control of your cloud budget..

The implementation of cost transparency and visibility features within RavenDB Cloud has an outsized impact on cost management and FinOps practices. It empowers you to make informed decisions, optimize spending, and achieve better financial control.

The idea is to make it easier for you to spend your money wisely. I’m really happy with this feature. It may seem small, but it will make a difference. It also fits very well with our overall philosophy that we should take the burden of complexity off your shoulders and onto ours.

time to read 8 min | 1522 words

There are at least 3 puns in the title of this blog post. I’m sorry, but I’m writing this after several days of tracking an impossible bug. I’m actually writing a set of posts to wind down from this hunt, so you’ll have to suffer through my more prosaic prose.

This bug is the kind that leaves you questioning your sanity after days of pursuit, the kind that I’m sure I’ll look back on and blame for any future grey hair I have. I’m going to have another post talking about the bug since it is such a doozy. In this post, I want to talk about the general approach I take when dealing with something like this.

Beware, this process involves a lot of hair-pulling. I’m saving that for when the real nasties show up.

The bug in question was a race condition that defied easy reproduction. It didn’t show up consistently—sometimes it surfaced, sometimes it didn’t. The only “reliable” way to catch it was by running a full test suite, which took anywhere from 8 to 12 minutes per run. If the suite hung, we knew we had a hit. But that left us with a narrow window to investigate before the test timed out or crashed entirely. To make matters worse, the bug was in new C code called from a .NET application.

New C code is a scary concept. New C code that does multithreading is an even scarier concept. Race conditions there are almost expected, right?

That means that the feedback cycle is long. Any attempt we make to fix it is going to be unreliable - “Did I fix it, or it just didn’t happen?” and there isn’t a lot of information going on.The first challenge was figuring out how to detect the bug reliably.

Using Visual Studio as the debugger was useless here—it only reproduced in release mode, and even with native debugging enabled, Visual Studio wouldn’t show the unmanaged code properly. That left us blind to the C library where the bug resided. I’m fairly certain that there are ways around that, but I was more interested in actually getting things done than fighting the debugger.

We got a lot of experience with WinDbg, a low-level debugger and a real powerhouse. It is also about as friendly as a monkey with a sore tooth and an alcohol addiction. The initial process was all about trying to reproduce the hang and then attach WinDbg to it.

Turns out that we never actually generated PDBs for the C library. So we had to figure out how to generate them, then how to carry them all the way from the build to the NuGet package to the deployment for testing - to maybe reproduce the bug again. Then we could see in what area of the code we are even in.

Getting WinDbg attached is just the start; we need to sift through the hundreds of threads running in the system. That is where we actually started applying the proper process for this.

This piece of code is stupidly simple, but it is sufficient to reduce “what thread should I be looking at” from 1 - 2 minutes to 5 seconds.


SetThreadDescription(GetCurrentThread(), L"Rvn.Ring.Wrkr");

I had the thread that was hanging, and I could start inspecting its state. This was a complex piece of code, so I had no idea what was going on or what the cause was. This is when we pulled the next tool from our toolbox.


void alert() {
    while (1) {
        Beep(800, 200);
        Sleep(200);
    }
}

This isn’t a joke, it is a super important aspect. In WinDbg, we noticed some signs in the data that the code was working on, indicating that something wasn’t right. It didn’t make any sort of sense, but it was there. Here is an example:


enum state
{
  red,
  yellow,
  green
};


enum state _currentState;

And when we look at it in the debugger, we get:


0:000> dt _currentState
Local var @ 0x50b431f614 Type state
17 ( banana_split )

That is beyond a bug, that is some truly invalid scenario. But that also meant that I could act on it. I started adding things like this:


if(_currentState != red && 
   _currentState != yellow && 
   _currentState != green) {
   alert();
}

The end result of this is that instead of having to wait & guess, I would now:

  • Be immediately notified when the issue happened.
  • Inspect the problematic state earlier.
  • Hopefully glean some additional insight so I can add more of those things.

With this in place, we iterated. Each time we spotted a new behavior hinting at the bug’s imminent trigger, we put another call to the alert function to catch it earlier. It was crude but effective—progressively tightening the noose around the problem.

Race conditions are annoyingly sensitive; any change to the system—like adding debug code—alters its behavior. We hit this hard. For example, we’d set a breakpoint in WinDbg, and the alert function would trigger as expected. The system would beep, we’d break in, and start inspecting the state. But because this was an optimized release build, the debugging experience was a mess. Variables were often optimized away into registers or were outright missing, leaving us to guess what was happening.

I resorted to outright hacks like this function:


__declspec(noinline) void spill(void* ptr) {
    volatile void* dummy = ptr;
    dummy; // Ensure dummy isn't flagged as unused
}

The purpose of this function is to force the compiler to assign an address to a value. Consider the following code:


if (work->completed != 0) {
    printf("old_global_state : %p, current state: %p\n",
         old_global_state, handle_ptr->global_state);
    alert();
    spill(&work);
}

Because we are sending a pointer to the work value to the spill function, the compiler cannot just put that in a register and must place it on the stack. That means that it is much easier to inspect it, of course.

Unfortunately, adding those spill calls led to the problem being “fixed”, we could no longer reproduce it. Far more annoyingly, any time that we added any sort of additional code to try to narrow down where this was happening, we had a good chance of either moving the behavior somewhere completely different or masking it completely.

Here are some of our efforts to narrow it down, if you want to see what the gory details look like.

At this stage, the process became a grind. We’d hypothesize about the bug’s root cause, tweak the code, and test again. Each change risked shifting the race condition’s timing, so we’d often see the bug vanish, only to reappear later in a slightly different form. The code quality suffered—spaghetti logic crept in as we layered hacks on top of hacks. But when you’re chasing a bug like this, clean code takes a back seat to results. The goal is to understand the failure, not to win a style award.

Bug hunting at this level is less about elegance and more about pragmatism. As the elusiveness of the bug increases, so does code quality and any other structured approach to your project. The only thing on your mind is, how do I narrow it down?. How do I get this chase to end?

Next time, I’ll dig into the specifics of this particular bug. For now, this is the high-level process: detect, iterate, hack, and repeat. No fluff—just the reality of the chase. The key in any of those bugs that we looked at is to keep narrowing the reproduction to something that you can get in a reasonable amount of time.

Once that happens, when you can hit F5 and get results, this is when you can start actually figuring out what is going on.

FUTURE POSTS

  1. NOT Sharding RavenDB Vector Search - one day from now
  2. Optimizing the cost of clearing a set - 4 days from now
  3. Scaling HNSW in RavenDB: Optimizing for inadequate hardware - 6 days from now

There are posts all the way to May 14, 2025

RECENT SERIES

  1. RavenDB News (2):
    02 May 2025 - May 2025
  2. Recording (15):
    30 Apr 2025 - Practical AI Integration with RavenDB
  3. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  4. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
  5. RavenDB 7.1 (6):
    18 Mar 2025 - One IO Ring to rule them all
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}