Ayende @ Rahien

filter by tags archive

architecture (612) rss
bugs (451) rss
challanges (123) rss
community (380) rss
databases (481) rss
design (895) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1085) rss
raven (1450) rss
ravendb.net (534) rss
reviews (184) rss

2025
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Couchbase vs RavenDB Performance at Rakuten Kobo Whitepaper

May 14 2025

Scaling HNSW in RavenDB: Optimizing for inadequate hardware

time to read 7 min | 1271 words

Tweet Share Share 2 comments

Tags:

RavenDB 7.0 introduces vector search using the Hierarchical Navigable Small World (HNSW) algorithm, a graph-based approach for Approximate Nearest Neighbor search. HNSW enables efficient querying of large vector datasets but requires random access to the entire graph, making it memory-intensive.

HNSW's random read and insert operations demand in-memory processing. For example, inserting 100 new items into a graph of 1 million vectors scatters them randomly, with no efficient way to sort or optimize access patterns. That is in dramatic contrast to the usual algorithms we can employ (B+Tree, for example), which can greatly benefit from such optimizations.

Let’s assume that we deal with vectors of 768 dimensions, or 3KB each. If we have 30 million vectors, just the vector storage is going to take 90GB of memory. Inserts into the graph require us to do effective random reads (with no good way to tell what those would be in advance). Without sufficient memory, performance degrades significantly due to disk swapping.

The rule of thumb for HNSW is that you want at least 30% more memory than the vectors would take. In the case of 90GB of vectors, the minimum required memory is going to be 128GB.

Cost reduction

You can also use quantization (shrinking the size of the vector with some loss of accuracy). Going to binary quantization for the same dataset requires just 3GB of space to store the vectors. But there is a loss of accuracy (may be around 20% loss).

We tested RavenDB’s HNSW implementation on a 32 GB machine with a 120 GB vector index (and no quantization), simulating a scenario with four times more data than available memory.

This is an utterly invalid scenario, mind you. For that amount of data, you are expected to build the HNSW index on a machine with 192GB. But we wanted to see how far we could stress the system and what kind of results we would get here.

The initial implementation stalled due to excessive memory use and disk swapping, rendering it impractical. Basically, it ended up doing a page fault on each and every operation, and that stalled forward progress completely.

Optimization Approach

Optimizing for such an extreme scenario seemed futile, this is an invalid scenario almost by definition. But the idea is that improving this out-of-line scenario will also end up improving the performance for more reasonable setups.

When we analyzed the costs of HNSW in a severely memory-constrained environment, we found that there are two primary costs.

Distance computations: Comparing vectors (e.g., cosine similarity) is computationally expensive.
Random vector access: Loading vectors from disk in a random pattern is slow when memory is insufficient.

Distance computation is doing math on two 3KB vectors, and on a large graph (tens of millions), you’ll typically need to run between 500 - 1,500 distance comparisons. To give some context, adding an item to a B+Tree of the same size will have fewer than twenty comparisons (and highly localized ones, at that).

That means reading about 2MB of data per insert on average. Even if everything is in memory, you are going to be paying a significant cost here in CPU cycles. If the data does not reside in memory, you have to fetch it (and it isn’t as neat as having a single 2MB range to read, it is scattered all over the place, and you need to traverse the graph in order to find what you need to read).

To address this issue, we completely restructured how we go about inserting nodes into the graph. We avoid serial execution and instead spawn multiple insert processes at the same time. Interestingly enough, we are single-threaded in this regard. We extract from the process the parts where it does the distance computation and loads the vectors.

Each process will run the algorithm until it reaches the stage where it needs to run distance computation on some vectors. In that case, it will yield to another process and let it run. We keep doing this until we have no more runnable processes to execute.

We can then scan the list of nodes that we need to process (run distance computation on all their edges), and we can then:

Gather all vectors needed for graph traversal.
Preload them from disk efficiently.
Run distance computations across multiple threads simultaneously.

The idea here is that we save time by preloading data efficiently. Once the data is loaded, we throw all the distance computations per node to the thread pool. As soon as any of those distance computations are done, we resume the stalled process for it.

The idea is that at any given point in time, we have processes moving forward or we are preloading from the disk, while at the same time we have background threads running to compute distances.

This allows us to make maximum use of the hardware. Here is what this looks like midway through the process:

As you can see, we still have to go to the disk a lot (no way around that with a working set of 120GB on a 32GB machine), but we are able to make use of that efficiently. We always have forward progress instead of just waiting.

Don’ttry this on the cloud - most cloud instances have a limited amount of IOPS to use, and this approach will burn through any amount you have quickly. We are talking about roughly 15K - 20K IOPS on a sustained basis. This is meant for testing in adverse conditions, on hardware that is utterly unsuitable for the task at hand. A machine with the proper amount of memory will not have to deal with this issue.

While still slow on a 32 GB machine due to disk reliance, this approach completed indexing 120 GB of vectors in about 38 hours (average rate of 255 vectors/sec). To compare, when we ran the same thing on a 64GB machine, we were able to complete the indexing process in just over 14 hours (average rate of 694 vectors/sec).

Accuracy of the results

When inserting many nodes in parallel to the graph, we risk a loss of accuracy. When we insert them one at a time, nodes that are close together will naturally be linked to one another. But when running them in parallel, we may “miss” those relations because the two nodes aren’t yet discoverable.

To mitigate that scenario, we preemptively link all the in-flight nodes to each other, then run the rest of the algorithm. If they are not close to one another, these edges will be pruned. It turns out that this behavior actually increased the overall accuracy (by about 1%) compared to the previous behavior. This is likely because items that are added at the same time are naturally linked to one another.

Results

On smaller datasets (“merely” hundreds of thousands of vectors) that fit in memory, the optimizations reduced indexing time by 44%! That is mostly because we now operate in parallel and have more optimized I/O behavior.

We are quite a bit faster, we are (slightly) more accurate, and we also have reasonable behavior when we exceed the system limits.

What about binary quantization?

I mentioned that binary quantization can massively reduce the amount of space required for vector search. We also ran tests with the same dataset using binary quantization. The total size of all the vectors was around 3GB, and the total index size (including all the graph nodes, etc.) was around 18 GB. That took under 5 hours to index on the same 32GB machine.

If you care to know what it is that we are doing, take a look at the Hnsw.Parallel file inside the repository. The implementation is quite involved, but I’m really proud of how it turned out in the end.

May 12 2025

Optimizing the cost of clearing a set

time to read 8 min | 1414 words

Tweet Share Share 6 comments

Tags:

Let’s say I want to count the number of reachable nodes for each node in the graph. I can do that using the following code:

void DFS(Node start, HashSet<Node> visited) 
{
    if (start == null || visited.Contains(start)) return;
    visited.Add(start);
    foreach (var neighbor in start.Neighbors) 
    {
        DFS(neighbor, visited);
    }
}


void MarkReachableCount(Graph g)
{
   foreach(var node in g.Nodes)
   {
       HashSet<Node> visited = [];
       DFS(node, visisted);
       node.ReachableGraph = visited.Count;
   }
}

A major performance cost for this sort of operation is the allocation cost. We allocate a separate hash set for each node in the graph, and then allocate whatever backing store is needed for it. If you have a big graph with many connections, that is expensive.

A simple fix for that would be to use:

void MarkReachableCount(Graph g)
{
   HashSet<Node> visited = [];
   foreach(var node in g.Nodes)
   {
       visited.Clear();
       DFS(node, visisted);
       node.ReachableGraph = visited.Count;
   }
}

This means that we have almost no allocations for the entire operation, yay!

This function also performs significantly worse than the previous one, even though it barely allocates. The reason for that? The call to Clear() is expensive. Take a look at the implementation - this method needs to zero out two arrays, and it will end up being as large as the node with the most reachable nodes. Let’s say we have a node that can access 10,000 nodes. That means that for each node, we’ll have to clear an array of about 14,000 items, as well as another array that is as big as the number of nodes we just visited.

No surprise that the allocating version was actually cheaper. We use the visited set for a short while, then discard it and get a new one. That means no expensive Clear() calls.

The question is, can we do better? Before I answer that, let’s try to go a bit deeper in this analysis. Some of the main costs in HashSet<Node> are the calls to GetHashCode() and Equals(). For that matter, let’s look at the cost of the Neighbors array on the Node.

Take a look at the following options:

public record Node1(List<Node> Neighbors);
public record Node2(List<int> NeighborIndexes);

Let’s assume each node has about 10 - 20 neighbors. What is the cost in memory for each option? Node1 uses references (pointers), and will take 256 bytes just for the Neighbors backing array (32-capacity array x 8 bytes). However, the Node2 version uses half of that memory.

This is an example of data-oriented design, and saving 50% of our memory costs is quite nice. HashSet<int> is also going to benefit quite nicely from JIT optimizations (no need to call GetHashCode(), etc. - everything is inlined).

We still have the problem of allocations vs. Clear(), though. Can we win?

Now that we have re-framed the problem using int indexes, there is a very obvious optimization opportunity: use a bit map (such as BitsArray). We know upfront how many items we have, right? So we can allocate a single array and set the corresponding bit to mark that a node (by its index) is visited.

That dramatically reduces the costs of tracking whether we visited a node or not, but it does not address the costs of clearing the bitmap.

Here is how you can handle this scenario cheaply:

public class Bitmap
{
    private ulong[] _data;
    private ushort[] _versions;
    private int _version;


    public Bitmap(int size)
    {
        _data = new ulong[(size + 63) / 64];
        _versions = new ushort[_data.Length];
    }


    public void Clear()
    {
        if(_version++ < ushort.MaxValue)
            return;
            
        Array.Clear(_data);
        Array.Clear(_versions);
    }


    public bool Add(int index)
    {
        int arrayIndex = index >> 6;
        if(_versions[arrayIndex] != _version)
        {
            _versions[arrayIndex] = _version;
            _data[arrayIndex] = 0;
        }


        int bitIndex = index & 63;
        ulong mask = 1UL << bitIndex;
        ulong old = _data[arrayIndex];
        _data[arrayIndex] |= mask;
        return (old & mask) == 0;
    }
}

The idea is pretty simple, in addition to the bitmap - we also have another array that marks the version of each 64-bit range. To clear the array, we increment the version. That would mean that when adding to the bitmap, we reset the underlying array element if it doesn’t match the current version. Once every 64K items, we’ll need to pay the cost of actually resetting the backing stores, but that ends up being very cheap overall (and worth the space savings to handle the overflow).

The code is tight, requires no allocations, and performs very quickly.

May 09 2025

NOT Sharding RavenDB Vector Search

time to read 5 min | 858 words

Tweet Share Share 0 comments

Tags:

I ran into an interesting post, "Sharding Pgvector," in which PgDog (provider of scaling solutions for Postgres) discusses scaling pgvector indexes (HNSW and IVFFlat) across multiple machines to manage large-scale embeddings efficiently. This approach speeds up searches and improves recall by distributing vector data, addressing the limitations of fitting large indexes into memory on a single machine.

That was interesting to me because they specifically mention this Wikipedia dataset, consisting of 35.1 million vectors. That… is not really enough to justify sharding, in my eyes. The dataset is about 120GB of Parquet files, so I threw that into RavenDB using the following format:

Each vector has 768 dimensions in this dataset.

33 minutes later, I had the full dataset in RavenDB, taking 163 GB of storage space. The next step was to define a vector search index, like so:

from a in docs.Articles
select new 
{
    Vector = CreateVector(a.Embedding)
}

That index (using the HNSW algorithm) is all that is required to start doing proper vector searches in RavenDB.

Here is what this looks like - we have 163GB for the raw data, and the index itself is 119 GB.

RavenDB (and PgVector) actually need to store the vectors twice - once in the data itself and once in the index.

Given the size of the dataset, I used a machine with 192 GB of RAM to create the index. Note that this still means the total data size is about ⅓ bigger than the available memory, meaning we cannot compute it all in memory.

This deserves a proper explanation - HNSW is a graph algorithm that assumes you can cheaply access any part of the graph during the indexing process. Indeed, this is effectively doing pure random reads on the entire dataset. You would generally run this on a machine with at least 192 GB of RAM. I assume this is why pgvector required sharding for this dataset.

I decided to test it out on several different machines. The key aspect here is the size of memory, I’m ignoring CPU counts and type, they aren’t the bottleneck for this scenario. As a reminder, we are talking about a total data size that is close to 300 GB.

RAM	RavenDB indexing time:
192 GB	2 hours, 20 minutes
64 GB	14 hours, 8 minutes
32 GB	37 hours, 40 minutes

Note that all of those were run on a single machine, all using local NVMe disk. And yes, that is less than two days to index that much data on a machine that is grossly inadequate for it.

I should note that on the smaller machines, query times are typically ~5ms, so even with a lot of data indexed, the actual search doesn’t need to run on a machine with a lot of memory.

In short, I don’t see a reason why you would need to use sharding for that amount of data. It can comfortably fit inside a reasonably sized machine, with room to spare.

I should also note that the original post talks about using the IVFFlat algorithm instead of HNSW. Pgvector supports both, but RavenDB only uses HNSW. From a technical perspective, I would love to be able to use IVFFlat, since it is a much more traditional algorithm for databases.

You run k-means over your data to find the centroids (so you can split the data into reasonably sized chunks), and then just do an efficient linear search on that small chunk as needed. It fits much more nicely into the way databases typically work. However, it also has some significant drawbacks:

You have to have the data upfront, you cannot build it incrementally.
The effectiveness of the IVFFlat index degrades over time with inserts & deletes, because the original centroids are no longer as accurate.

Because of those reasons, we didn’t implement that. HNSW is a far more complex algorithm, both in terms of the actual approach and the number of hoops we had to go through to implement that efficiently, but as you can see, it is able to provide good results even on large datasets, can be built incrementally and doesn’t degrade over time.

Head-to-head comparison

I decided to run pgvector and RavenDB on the same dataset to get some concrete ideas about their performance. Because I didn’t feel like waiting for hours, I decided to use this dataset. It has 485,859 vectors and about 1.6 GB of data.

RavenDB indexed that in 1 minute and 17 seconds. My first attempt with pgvector took over 7 minutes when setting maintenance_work_mem = '512MB'. I had to increase it to 2GB to get more reasonable results (and then it was 1 minute and 49 seconds).

RavenDB is able to handle it a lot better when there isn’t enough memory to keep it all in RAM, while pgvector seems to degrade badly.

Summary

In short, I don’t think that you should need to go for sharding (and its associated complexity) for that amount of data. And I say that as someone whose database has native sharding capabilities.

For best performance, you should run large vector indexes on machines with plenty of RAM, but even without that, RavenDB does an okay job of keeping things ticking.

May 08 2025

Making the costs visible, then fixing them

time to read 8 min | 1458 words

Tweet Share Share 0 comments

Tags:

We recently tackled performance improvements for vector search in RavenDB. The core challenge was identifying performance bottlenecks. Details of specific changes are covered in a separate post. The post is already written, but will be published next week, here is the direct link to that.

In this post, I don’t want to talk about the actual changes we made, but the approach we took to figure out where the cost is.

Take a look at the following flame graph, showing where our code is spending the most time.

As you can see, almost the entire time is spent computing cosine similarity. That would be the best target for optimization, right?

I spent a bunch of time writing increasingly complicated ways to optimize the cosine similarity function. And it worked, I was able to reduce the cost by about 1.5%!

That is something that we would generally celebrate, but it was far from where we wanted to go. The problem was elsewhere, but we couldn’t see it in the profiler output because the cost was spread around too much.

Our first task was to restructure the code so we could actually see where the costs were. For instance, loading the vectors from disk was embedded within the algorithm. By extracting and isolating this process, we could accurately profile and measure its performance impact.

This restructuring also eliminated the "death by a thousand cuts" issue, making hotspots evident in profiling results. With clear targets identified, we can now focus optimization efforts effectively.

That major refactoring had two primary goals. The first was to actually extract the costs into highly visible locations, the second had to do with how you address them. Here is a small example that scans a social graph for friends, assuming the data is in a file.

def read_user_friends(file, user_id: int) -> List[int]:
    """Read friends for a single user ID starting at indexed offset."""
    pass # redacted


def social_graph(user_id: int, max_depth: int) -> Set[int]:
    if max_depth < 1:
        return set()
    
    all_friends = set() 
    visited = {user_id} 
    work_list = deque([(user_id, max_depth)])  
    
    with open("relations.dat", "rb") as file:
        while work_list:
            curr_id, depth = work_list.popleft()
            if depth <= 0:
                continue
                
            for friend_id in read_user_friends(file, curr_id):
                if friend_id not in visited:
                    all_friends.add(friend_id)
                    visited.add(friend_id)
                    work_list.append((friend_id, depth - 1))
    
    return all_friends

If you consider this code, you can likely see that there is an expensive part of it, reading from the file. But the way the code is structured, there isn’t really much that you can do about it. Let’s refactor the code a bit to expose the actual costs.

def social_graph(user_id: int, max_depth: int) -> Set[int]:
    if max_depth < 1:
        return set()
    
    all_friends = set() 
    visited = {user_id} 
    
    with open("relations.dat", "rb") as file:
        work_list =  read_user_friends(file, [user_id])
        while work_list and max_depth >= 0:
            to_scan = set()
            for friend_id in work_list: # gather all the items to read
                if friend_id in visited:
                    continue
                
                all_friends.add(friend_id)
                visited.add(friend_id)
                to_scan.add(curr_id)


            # read them all in one shot
            work_list = read_users_friends(file, to_scan)
            # reduce for next call
            max_depth = max_depth - 1    
    
    return all_friends

Now, instead of scattering the reads whenever we process an item, we gather them all and then send a list of items to read all at once. The costs are far clearer in this model, and more importantly, we have a chance to actually do something about it.

Optimizing a lot of calls to read_user_friends(file, user_id) is really hard, but optimizing read_users_friends(file, users) is a far simpler task. Note that the actual costs didn’t change because of this refactoring, but the ability to actually make the change is now far easier.

Going back to the flame graph above, the actual cost profile differs dramatically as the size of the data rose, even if the profiler output remained the same. Refactoring the code allowed us to see where the costs were and address them effectively.

Here is the end result as a flame graph. You can clearly see the preload section that takes a significant portion of the time. The key here is that the change allowed us to address this cost directly and in an optimal manner.

The end result for our benchmark was:

Before: 3 minutes, 6 seconds
After: 2 minutes, 4 seconds

So almost exactly ⅓ of the cost was removed because of the different approach we took, which is quite nice.

This technique, refactoring the code to make the costs obvious, is a really powerful one. Mostly because it is likely the first step to take anyway in many performance optimizations (batching, concurrency, etc.).

May 07 2025

Event: Exploring the Power of AI Search in Modern Applications (TODAY)

time to read 1 min | 161 words

Tweet Share Share 0 comments

Tags:

Join Our Community Discussion: Exploring the Power of AI Search in Modern Applications

We're excited to announce our second Community Open Discussion, focusing on a transformative feature in today's applications: AI search.This technology is rapidly becoming the new standard for delivering intelligent and intuitive search experiences.

Join Dejan from our DevRel team for an open and engaging discussion.Whether you're eager to learn, contribute your insights, or simply listen in, everyone is welcome!

We’ll talk about:

The growing popularity and importance of AI search.
A deep dive into the technical aspects, including embeddings generation, query term caching, and quantization techniques.
An open forum to discuss best practices and various approaches to implementing AI search.
A live showcase demonstrating how RavenDB AI Integration allows you to implement AI Search in just 5 minutes, with the same simplicity as our regular search API.

Event Details:

Date: Wednesday, May 7th, 19:00 CET
Location:RavenDB Developers Community Discord

Bring your questions and your enthusiasm – we look forward to seeing you there!

May 05 2025

RavenDB Storage Provider for Orleans

time to read 1 min | 187 words

Tweet Share Share 0 comments

Tags:

Orleans is a distributed computing framework for .NET. It allows you to build distributed systems with ease, taking upon itself all the state management, persistence, distribution, and concurrency.

The core aspect in Orleans is the notion of a “grain” - a lightweight unit of computation & state. You can read more about it in Microsoft’s documentation, but I assume that if you are reading this post, you are already at least somewhat familiar with it.

We now support using RavenDB as the backing store for grain persistence, reminders, and clustering. You can read the official announcement about the release here, and the docs covering how to use RavenDB & Microsoft Orleans.

You can use RavenDB to persist and retrieve Orleans grain states, store Orleans timers and reminders, as well as manage Orleans cluster membership.

RavenDB is well suited for this task because of its asynchronous nature, schema-less design, and the ability to automatically adjust itself to different loads on the fly.

If you are using Orleans, or even just considering it, give it a spin with RavenDB. We would love your feedback.

May 02 2025

RavenDB NewsMay 2025

time to read 3 min | 441 words

Tweet Share Share 0 comments

Tags:

RavenDB is moving at quite a pace, and there is actually more stuff happening than I can find the time to talk about. I usually talk about the big-ticket items, but today I wanted to discuss some of what we like to call Quality of Life features.

The sort of things that help smooth the entire process of using RavenDB - the difference between something that works and something polished. That is something I truly care about, so with a great sense of pride, let me walk you through some of the nicest things that you probably wouldn’t even notice that we are doing for you.

RavenDB Node.js Client - v7.0 released (with Vector Search)

We updated the RavenDB Node.js client to version 7.0, with the biggest item being explicit support for vector search queries from Node.js. You can now write queries like these:

const docs = session.query<Product>({collection: "Products"})
   .vectorSearch(x => x.withText("Name"),
      factory => factory.byText("italian food"))
  .all();

This is the famous example of using RavenDB’s vector search to find pizza and pasta in your product catalog, utilizing vector search and automatic data embeddings.

Converting automatic indexes to static indexes

RavenDB has auto indexes. Send a query, and if there is no existing index to run the query, the query optimizer will generate one for you. That works quite amazingly well, but sometimes you want to use this automatic index as the basis for a static (user-defined) index. Now you can do that directly from the RavenDB Studio, like so:

You can read the full details of the feature at the following link.

RavenDB Cloud - Incidents History & Operational Suggestions

We now expose the operational suggestions to you on the dashboard. The idea is that you can easily and proactively check the status of your instances and whether you need to take any action.

You can also see what happened to your system in the past, including things that RavenDB’s system automatically recovered from without you needing to lift a finger.

For example, take a look at this highly distressed system:

As usual, I would appreciate any feedback you have on the new features.

Apr 30 2025

RecordingPractical AI Integration with RavenDB

time to read 1 min | 92 words

Tweet Share Share 0 comments

Tags:

Last week I did an hour long webinar showing AI integration in RavenDB. From vector search to RAG, from embedding generation to Gen AI inside of the database engine.

Most of those features are already released, but I would really love your feedback on the Gen AI integration story (starts at around to 30 minutes mark in the video).

Let me know what you think!

Apr 28 2025

The null check that didn't check for nulls

time to read 2 min | 342 words

Tweet Share Share 6 comments

Tags:

I wrote the following code:

if (_items is [var single])
{
    // no point invoking thread pool
    single.Run();
}

And I was very proud of myself for writing such pretty and succinct C# code.

Then I got a runtime error:

I asked Grok about this because I did not expect this, and got the following reply:

No, if (_items is [var single]) in C# does not match a null value. This pattern checks if _items is a single-element array and binds the element to single. If _items is null, the pattern match fails, and the condition evaluates to false.

However, the output clearly disagreed with both Grok’s and my expectations. I decided to put that into SharpLab, which can quickly help identify what is going on behind the scenes for such syntax.

You can see three versions of this check in the associated link.

if(strs is [var s]) // no null check


if(strs is [string s]) //  if (s != null)


if(strs is [{} s]) //  if (s != null)

Turns out that there is a distinction between a var pattern (allows null) and a non-var pattern. The third option is the non-null pattern, which does the same thing (but doesn’t require redundant type specification). Usually var vs. type is a readability distinction, but here we have a real difference in behavior.

Note that when I asked the LLM about it, I got the wrong answer. Luckily, I could get a verified answer by just checking the compiler output, and only then head out to the C# spec to see if this is a compiler bug or just a misunderstanding.

Apr 25 2025

AI Impact - normalizing Good Enough

time to read 3 min | 420 words

Tweet Share Share 0 comments

Tags:

I was just reviewing a video we're about to publish, and I noticed something in the subtitles. It said, "Six qubits are used for..."

I got all excited thinking RavenDB was jumping into quantum computing. But nope, it turned out to be a transcription error. What was actually said was, "Six kilobytes are used for..."

To be fair, I listened to the recording a few times, and honestly, "qubits" isn't an unreasonable interpretation if you're just going by the spoken words. Even with context, that transcription isn't completely out there. I wouldn't be surprised if a human transcriber came up with the same result.

Fixing this issue (and going over an hour of text transcription to catch other possible errors) is going to be pretty expensive. Honestly, it would be easier to just skip the subtitles altogether in that case.

Here's the thing, though. I think a big part of this is that we now expect transcription to be done by a machine, and we don't expect it to be perfect. Before, when it was all done manually, it cost so much that it was reasonable to expect near-perfection.

What AI has done is make it cheap enough to get most of the value, while also lowering the expectation that it has to be flawless.

So, the choices we're looking at are:

AI transcription - mostly accurate, cheap, and easy to do.
Human transcription - highly accurate, expensive, and slow.
No transcription - users who want subtitles would need to use their own automatic transcription (which would probably be lower quality than what we use).

Before, we really only had two options: human transcription or nothing at all. What I think the spread of AI has done is not just made it possible to do it automatically and cheaply, but also made it acceptable that this "Good Enough" solution is actually, well, good enough.

Viewers know it's a machine translation, and they're more forgiving if there are some mistakes. That makes it way more practical to actually use it. And the end result? We can offer more content.

Sure, it's not as good as manual transcription, but it's definitely better than having no transcription at all (which is really the only other option).

What I find most interesting is that it's the fact that this is so common now that makes it possible to actually use it more.

Yes, we actually review the subtitles and fix any obvious mistakes for the video. The key here is that we can spend very little time actually doing that, since errors are more tolerated.

Oren Eini

Oren Eini

CEO of RavenDB

Scaling HNSW in RavenDB: Optimizing for inadequate hardware

Cost reduction

Optimization Approach

Accuracy of the results

Results

What about binary quantization?

Optimizing the cost of clearing a set

NOT Sharding RavenDB Vector Search

Head-to-head comparison

Summary

Making the costs visible, then fixing them

Event: Exploring the Power of AI Search in Modern Applications (TODAY)

RavenDB Storage Provider for Orleans

RavenDB NewsMay 2025

RavenDB Node.js Client - v7.0 released (with Vector Search)

Converting automatic indexes to static indexes

RavenDB Cloud - Incidents History & Operational Suggestions

RecordingPractical AI Integration with RavenDB

The null check that didn't check for nulls

AI Impact - normalizing Good Enough

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed