I commented that we should move the Increment() operation outside of the loop because if two threads are calling Register() at the same time, we’ll have a lot of contention here.
The reply was that this was intentional since calling Interlocked.CompareExchange() to do the update in a batch manner is more complex. The issue was a lack of familiarity with the Interlocked.Add() function, which allows us to write the function as:
Both options have essentially the same exact performance characteristics, but if we need to register a large batch of items, the second option drastically reduces the contention.
In this case, we don’t actually care about having an accurate count as items are added, so there is no reason to avoid the optimization.
I care about the performance of RavenDB. Enough that I would go to epic lengths to fix them. Here I use “epic” both in terms of the Agile meaning of multi-month journeys and the actual amount of work required. See my recent posts about RavenDB 7.1 I/O work.
There hasn’t been a single release in the past 15 years that didn’t improve the performance of RavenDB in some way. We have an entire team whose sole task is to find bottlenecks and fix them, to the point where assembly language is a high-level concept at times (yes, we design some pieces of RavenDB with CPU microcode for performance).
When we ran into this issue, I was… quite surprised, to say the least. The problem was that whenever we serialized a document in RavenDB, we would compile some LINQ expressions.
That is expensive, and utterly wasteful. That is the sort of thing that we should never do, especially since there was no actual need for it.
Here is the essence of this fix:
We ran a performance test on the before & after versions, just to know what kind of performance we left on the table.
Before (ms)
After (ms)
33,782
20
The fixed version is 1,689 times faster, if you can believe that.
So here is a fix that is both great to have and quite annoying. We focused so much effort on optimizing the server, and yet we missed something that obvious? How can that be?
Well, the answer is that this isn’t an actual benchmark. The problem is that this code is invoked per instance created instead of globally, and it is created once per thread. In any situation where the number of threads is more or less fixed (most production scenarios, where you’ll be using a thread pool, as well as in most benchmarks), you are never going to see this problem.
It is when you have threads dying and being created (such as when you handle spikes) that you’ll run into this issue. Make no mistake, it is an actual issue. When your load spikes, the thread pool will issue new threads, and they will consume a lot of CPU initially for absolutely no reason.
In short, we managed to miss this entirely (the code dates to 2017!) for a long time. It never appeared in any benchmark. The fix itself is trivial, of course, and we are unlikely to be able to show any real benefits from it in a benchmark, but that is yet another step in making RavenDB better.
One of the more interesting developments in terms of kernel API surface is the IO Ring. On Linux, it is called IO Uring and Windows has copied it shortly afterward. The idea started as a way to batch multiple IO operations at once but has evolved into a generic mechanism to make system calls more cheaply. On Linux, a large portion of the kernel features is exposed as part of the IO Uring API, while Windows exposes a far less rich API (basically, just reading and writing).
The reason this matters is that you can use IO Ring to reduce the cost of making system calls, using both batching and asynchronous programming. As such, most new database engines have jumped on that sweet nectar of better performance results.
As part of the overall re-architecture of how Voron manages writes, we have done the same. I/O for Voron is typically composed of writes to the journals and to the data file, so that makes it a really good fit, sort of.
An ironic aspect of IO Uring is that despite it being an asynchronous mechanism, it is inherently single-threaded. There are good reasons for that, of course, but that means that if you want to use the IO Ring API in a multi-threaded environment, you need to take that into account.
A common way to handle that is to use an event-driven system, where all the actual calls are generated from a single “event loop” thread or similar. This is how the Node.js API works, and how .NET itself manages IO for sockets (there is a single thread that listens to socket events by default).
The whole point of IO Ring is that you can submit multiple operations for the kernel to run in as optimal a manner as possible. Here is one such case to consider, this is the part of the code where we write the modified pages to the data file:
using (fileHandle){for(int i =0; i <pages.Length; i++){
int numberOfPages = pages[i].GetNumberOfPages();var size = numberOfPages *Constants.Storage.PageSize;var offset = pages[i].PageNumber *Constants.Storage.PageSize;var span =newSpan<byte>(pages[i].Pointer, size);RandomAccess.Write(fileHandle, span, offset);
written += numberOfPages *Constants.Storage.PageSize;}}
Actually, those aren’t threads in the normal sense. Those are kernel tasks, generated by the IO Ring at the kernel level directly. It turns out that internally, IO Ring may spawn worker threads to do the async work at the kernel level. When we had a separate IO Ring per file, each one of them had its own pool of threads to do the work.
The way it usually works is really interesting. The IO Ring will attempt to complete the operation in a synchronous manner. For example, if you are writing to a file and doing buffered writes, we can just copy the buffer to the page pool and move on, no actual I/O took place. So the IO Ring will run through that directly in a synchronous manner.
However, if your operation requires actual blocking, it will be sent to a worker queue to actually execute it in the background. This is one way that the IO Ring is able to complete many operations so much more efficiently than the alternatives.
In our scenario, we have a pretty simple setup, we want to write to the file, making fully buffered writes. At the very least, being able to push all the writes to the OS in one shot (versus many separate system calls) is going to reduce our overhead. More interesting, however, is that eventually, the OS will want to start writing to the disk, so if we write a lot of data, some of the requests will be blocked. At that point, the IO Ring will switch them to a worker thread and continue executing.
The problem we had was that when we had a separate IO Ring per data file and put a lot of load on the system, we started seeing contention between the worker threads across all the files. Basically, each ring had its own separate pool, so there was a lot of work for each pool but no sharing.
If the IO Ring is single-threaded, but many separate threads lead to wasted resources, what can we do? The answer is simple, we’ll use a single global IO Ring and manage the threading concerns directly.
Here is the setup code for that (I removed all error handling to make it clearer):
void*do_ring_work(void*arg){
int rc;if(g_cfg.low_priority_io){syscall(SYS_ioprio_set, IOPRIO_WHO_PROCESS,0,IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE,7));}pthread_setname_np(pthread_self(),"Rvn.Ring.Wrkr");
struct io_uring *ring =&g_worker.ring;
struct workitem *work = NULL;while(true){do{// wait for any writes on the eventfd // completion on the ring (associated with the eventfd)
eventfd_t v;
rc =read(g_worker.eventfd,&v,sizeof(eventfd_t));}while(rc <0&& errno == EINTR);
bool has_work =true;while(has_work){
int must_wait =0;
has_work =false;if(!work){// we may have _previous_ work to run through
work =atomic_exchange(&g_worker.head,0);}while(work){
has_work =true;
struct io_uring_sqe *sqe =io_uring_get_sqe(ring);if(sqe == NULL){
must_wait =1;
goto sumbit_and_wait;// will retry}io_uring_sqe_set_data(sqe, work);switch(work->type){case workitem_fsync:io_uring_prep_fsync(sqe, work->filefd, IORING_FSYNC_DATASYNC);break;case workitem_write:io_uring_prep_writev(sqe, work->filefd, work->op.write.iovecs,
work->op.write.iovecs_count, work->offset);break;default:break;}
work = work->next;}
sumbit_and_wait:
rc = must_wait ?io_uring_submit_and_wait(ring, must_wait):io_uring_submit(ring);
struct io_uring_cqe *cqe;
uint32_t head =0;
uint32_t i =0;io_uring_for_each_cqe(ring, head, cqe){
i++;// force another run of the inner loop, // to ensure that we call io_uring_submit again
has_work =true;
struct workitem *cur =io_uring_cqe_get_data(cqe);if(!cur){// can be null if it is:// * a notification about eventfd writecontinue;}switch(cur->type){case workitem_fsync:notify_work_completed(ring, cur);break;case workitem_write:if(/* partial write */){// queue againcontinue;}notify_work_completed(ring, cur);break;}}io_uring_cq_advance(ring, i);}}return0;}
What does this code do?
We start by checking if we want to use lower-priority I/O, this is because we don’t actually care how long those operations take. The purpose of writing the data to the disk is that it will reach it eventually. RavenDB has two types of writes:
Journal writes (durable update to the write-ahead log, required to complete a transaction).
Data flush / Data sync (background updates to the data file, currently buffered in memory, no user is waiting for that)
As such, we are fine with explicitly prioritizing the journal writes (which users are waiting for) in favor of all other operations.
What is this C code? I thought RavenDB was written in C#
RavenDB is written in C#, but for very low-level system details, we found that it is far easier to write a Platform Abstraction Layer to hide system-specific concerns from the rest of the code. That way, we can simply submit the data to write and have the abstraction layer take care of all of that for us. This also ensures that we amortize the cost of PInvoke calls across many operations by submitting a big batch to the C code at once.
After setting the IO priority, we start reading from what is effectively a thread-safe queue. We wait for eventfd() to signal that there is work to do, and then we grab the head of the queue and start running.
The idea is that we fetch items from the queue, then we write those operations to the IO Ring as fast as we can manage. The IO Ring size is limited, however. So we need to handle the case where we have more work for the IO Ring than it can accept. When that happens, we will go to the submit_and_wait label and wait for something to complete.
Note that there is some logic there to handle what is going on when the IO Ring is full. We submit all the work in the ring, wait for an operation to complete, and in the next run, we’ll continue processing from where we left off.
The rest of the code is processing the completed operations and reporting the result back to their origin. This is done using the following function, which I find absolutely hilarious:
Remember that when we submit writes to the data file, we must wait until they are all done. The async nature of IO Ring is meant to help us push the writes to the OS as soon as possible, as well as push writes to multiple separate files at once. For that reason, we use anothereventfd() to wait (as the submitter) for the IO Ring to complete the operation. I love the code above because it is actually using the IO Ring itself to do the work we need to do here, saving us an actual system call in most cases.
Here is how we submit the work to the worker thread:
This function handles the submission of a set of pages to write to a file. Note that we protect against concurrent work on the same file. That isn’t actually needed since the caller code already handles that, but an uncontended lock is cheap, and it means that I don’t need to think about concurrency or worry about changes in the caller code in the future.
We ensure that we have sufficient buffer space, and then we create a work item. A work item is a single write to the file at a given location. However, we are using vectored writes, so we’ll merge writes to the consecutive pages into a single write operation. That is the purpose of the huge for loop in the code. The pages arrive already sorted, so we just need to do a single scan & merge for this.
Pay attention to the fact that the struct workitem actually belongs to two different linked lists. We have the next pointer, which is used to send work to the worker thread, and the prev pointer, which is used to iterate over the entire set of operations we submitted on completion (we’ll cover this in a bit).
The idea is pretty simple. We first wake the worker thread by writing to its eventfd(), and then we wait on our own eventfd() for the worker to signal us that (at least some) of the work is done.
Note that we handle the submission of multiple work items by iterating over them in reverse order, using the prev pointer. Only when all the work is done can we return to our caller.
The end result of all this behavior is that we have a completely new way to deal with background I/O operations (remember, journal writes are handled differently). We can control both the volume of load we put on the system by adjusting the size of the IO Ring as well as changing its priority.
The fact that we have a single global IO Ring means that we can get much better usage out of the worker thread pool that IO Ring utilizes. We also give the OS a lot more opportunities to optimize RavenDB’s I/O.
The code in this post shows the Linux implementation, but RavenDB also supports IO Ring on Windows if you are running a recent edition.
We aren’t done yet, mind, I still have more exciting things to tell you about how RavenDB 7.1 is optimizing writes and overall performance. In the next post, we’ll discuss what I call the High Occupancy Lane vs. Critical Lane for I/O and its impact on our performance.
A good lesson I learned about being a manager is that the bigger the organization, the more important it is for me to be silent. If we are discussing a set of options, I have to talk last, and usually, I have to make myself wait until the end of a discussion before I can weigh in on any issues I have with the proposed solutions.
Speaking last isn’t something I do to have the final word or as a power play, mind you. I do it so my input won’t “taint” the discussion. The bigger the organization, the more pressure there is to align with management. If I want to get unbiased opinions and proper input, I have to wait for it. That took a while to learn because the gradual growth of the company meant that the tipping point basically snuck up on me.
One day, I was working closely with a small team. They would argue freely and push back if they thought I was wrong without hesitation. The next day, the company grew to the point where I would only rarely talk to some people, and when I did, it was the CEO talking, not me.
It’s a subtle shift, but once you see it, you can’t unsee it. I keep thinking if I need to literally get a couple of hats and walk around in the office wearing different hats at different times.
To deal with this issue, I went out of my way to get a few “no-men” (the opposite of yes-men), who can reliably tell me when what I’m proposing is… let’s call it an idealistic view of reality. These are the folks who’ll look at my grand plan to, say, overhaul our entire CRM in a week and say, “Hey, love the enthusiasm, but have you considered the part where we all spontaneously combust from stress?” There may have been some pointing at grey hair and receding hairlines as well.
The key here is that I got these people specifically because I value their opinions, even when I disagree with them. It’s like having a built-in reality check—annoying in the moment, but worth its weight in gold when it keeps you from driving the whole team off a cliff.
This ties into one of the trickier parts of managerial duties: knowing when to steer and when to step back. Early on, I thought being a manager was about having all the answers and making sure everyone knew it. But the reality? It’s more like being a gardener—you plant the seeds (the vision), water them (with resources and support), and then let the team grow into it.
My job isn’t to micromanage every leaf; it’s to make sure the conditions are right for the whole thing to thrive. That means trusting people to do their jobs, even if they don’t do it exactly how I would.
Of course, there’s another side to this gig: the ability to move the goalposts that measure what’s required. Changing the scope of a problem is a really good way to make something that used to be impossible a reality. I’m reminded of this XKCD comic—you know the one, where if you change the problem just enough to turn a “no way” into a “huh, that could work”? That’s a manager’s superpower.
You’re not just solving problems; you’re redefining them so the team can win. Maybe the deadline’s brutal, but if you shift the focus from “everything” to “we don’t need this feature for launch,” suddenly everyone’s breathing again.
It is a very strange feeling because you move from doing things yourself, to working with a team, to working at a distance of once or twice removed. On the one hand, you can get a lot more done, but on the other hand, it can be really frustrating when it isn’t done the way (and with the speed) that I could do it.
This isn’t a motivational post, it is not a fun aspect of my work. I only have so many hours in the day, and being careful about where I put my time is important. At the same time, it means that I have to take into account that what I say matters, and if I say something first, it puts a pretty big hurdle in front of other people if they disagree with me.
In other words, I know it can come off as annoying, but not giving my opinion on something is actually a well-thought-out strategy to get the raw information without influencing the output. When I have all the data, I can give my own two cents on the matter safely.
When we build a new feature in RavenDB, we either have at least some idea about what we want to build or we are doing something that is pure speculation. In either case, we will usually spend only a short amount of time trying to plan ahead.
A good example of that can be found in my RavenDB 7.1 I/O posts, which cover about 6+ months of work for a major overhaul of the system. That was done mostly as a series of discussions between team members, guidance from the profiler, and our experience, seeing where the path would lead us. In that case, it led us to a five-fold performance improvement (and we’ll do better still by the time we are done there).
That particular set of changes is one of the more complex and hard-to-execute changes we have made in RavenDB over the past 5 years or so. It touched a lot of code, it changed a lot of stuff, and it was done without any real upfront design. There wasn’t much point in designing, we knew what we wanted to do (get things faster), and the way forward was to remove obstacles until we were fast enough or ran out of time.
I re-read the last couple of paragraphs, and it may look like cowboy coding, but that is very much not the case. There is a process there, it is just not something we would find valuable to put down as a formal design document. The key here is that we have both a good understanding of what we are doing and what needs to be done.
RavenDB 4.0 design document
The design document we created for RavenDB 4.0 is probably the most important one in the project’s history. I just went through it again, it is over 20 pages of notes and details that discuss the current state of RavenDB at the time (written in 2015) and ideas about how to move forward.
It is interesting because I remember writing this document. And then we set out to actually make it happen, that wasn’t a minor update. It took close to three years to complete the process, to give you some context about the complexity and scale of the task.
To give some further context, here is an image from that document:
And here is the sharding feature in RavenDB right now:
This feature is called prefixed sharding in our documentation. It is the direct descendant of the image from the original 4.0 design document. We shipped that feature sometime last year. So we are talking about 10 years from “design” to implementation.
I’m using “design” in quotes here because when I go through this v4.0 design document, I can tell you that pretty much nothing that ended up in that document was implemented as envisioned. In fact, most of the things there were abandoned because we found much better ways to do the same thing, or we narrowed the scope so we could actually ship on time.
Comparing the design document to what RavenDB 4.0 ended up being is really interesting, but it is very notable that there isn’t much similarity between the two. And yet that design document was a fundamental part of the process of moving to v4.0.
What Are Design Documents?
A classic design document details the architecture, workflows, and technical approach for a software project before any code is written. It is the roadmap that guides the development process.
For RavenDB, we use them as both a sounding board and a way to lay the foundation for our understanding of the actual task we are trying to accomplish. The idea is not so much to build the design for a particular feature, but to have a good understanding of the problem space and map out various things that could work.
Recent design documents in RavenDB
I’m writing this post because I found myself writing multiple design documents in the past 6 months. More than I have written in years. Now that RavenDB 7.0 is out, most of those are already implemented and available to you. That gives me the chance to compare the design process and the implementation with recent work.
Vector Search & AI Integration for RavenDB
This was written in November 2024. It outlines what we want to achieve at a very high level. Most importantly, it starts by discussing what we won’t be trying to do, rather than what we will. Limiting the scope of the problem can be a huge force multiplier in such cases, especially when dealing with new concepts.
Reading throughout that document, it lays out the external-facing aspect of vector search in RavenDB. You have the vector.search() method in RQL, a discussion on how it works in other systems, and some ideas about vector generation and usage.
It doesn’t cover implementation details or how it will look from the perspective of RavenDB. This is at the level of the API consumer, what we want to achieve, not how we’ll achieve it.
AI Integration with RavenDB
Given that we have vector search, the next step is how to actually get and use it. This design document was a collaborative process, mostly written during and shortly after a big design discussion we had (which lasted for hours).
The idea there was to iron out the overall understanding of everyone about what we want to achieve. We considered things like caching and how it plays into the overall system, there are notes there at the level of what should be the field names.
That work has already been implemented. You can access it through the new AI button in the Studio. Check out this icon on the sidebar:
That was a much smaller task in scope, but you can see how even something that seemed pretty clear changed as we sat down and actually built it. Concepts we didn’t even think to consider were raised, handled, and implemented (without needing another design).
Voron HSNW Design Notes
This design document details our initial approach to building the HSNW implementation inside Voron, the basis for RavenDB’s new vector search capabilities.
That one is really interesting because it is a pure algorithmic implementation, completely internal to our usage (so no external API is needed), and I wrote it after extensive research.
The end result is similar to what I planned, but there are still significant changes. In fact, pretty much all the actual implementation details are different from the design document. That is both expected and a good thing because it means that once we dove in, we were able to do things in a better way.
Interestingly, this is often the result of other constraints forcing you to do things differently. And then everything rolls down from there.
“If you have a problem, you have a problem. If you have two problems, you have a path for a solution.”
In the case of HSNW, a really complex part of the algorithm is handling deletions. In our implementation, there is a vector, and it has an associated posting list attached to it with all the index entries. That means we can implement deletion simply by emptying the associated posting list. An entire section in the design document (and hours spent pondering) is gone, just like that.
If the design document doesn’t reflect the end result of the system, are they useful?
I would unequivocally state that they are tremendously useful. In fact, they are crucial for us to be able to tackle complex problems. The most important aspect of design documents is that they capture our view of what the problem space is.
Beyond their role in planning, design documents serve another critical purpose: they act as a historical record. They capture the team’s thought process, documenting why certain decisions were made and how challenges were addressed. This is especially valuable for a long-lived project like RavenDB, where future developers may need context to understand the system’s evolution.
Imagine a design document that explores a feature in detail—outlining options, discussing trade-offs, and addressing edge cases like caching or system integrations. The end result may be different, but the design document, the feature documentation (both public and internal), and the issue & commit logs serve to capture the entire process very well.
Sometimes, looking at the road not taken can give you a lot more information than looking at what you did.
I consider design documents to be a very important part of the way we design our software. At the same time, I don’t find them binding, we’ll write the software and see where it leads us in the end.
What are your expectations and experience with writing design documents? I would love to hear additional feedback.
At the time, no other logging framework was able to sustain the kind of performance that we required. The .NET community has come a long way since then, and it has become clear that we need to revisit this decision. Performance has a much higher priority, and the API at all levels supports that (spans, avoiding allocations, etc).
The move to NLog gives users a much simpler way to integrate RavenDB logs into their monitoring & observability pipeline.
RavenDB 7.0 adds Snowflake integration to the set of ETL targets it supports.
Snowflake is a data warehouse solution, designed for analytics and data at scale. RavenDB is aimed at transactional scenarios and has a really good story around data distribution and wide geographical deployments.
You can check out the documentation to read the details about how you can use this integration to push data from RavenDB to your Snowflake database. In this post, I want to introduce one usage scenario for such integration.
RavenDB is commonly deployed on the edge, running on site in grocery stores, restaurants’ self-serve kiosks, supermarket checkout counters, etc. Such environments have to be tough and resilient to errors, network problems, mishandling, and much more.
We had to field support calls in the style of “there is ketchup all over the database”, for example.
In such environments, you must operate completely independently of the cloud. Both because of latency and performance issues and because you must keep yourself up & running even if the Internet is down. RavenDB does very well in such a scenario because of its internal architecture, the ability to run in a multi-master configuration, and its replication capabilities.
From a business perspective, that is a critical capability, to push data to the edge and be able to operate independently of any other resource. At the same time, this represents a significant challenge since you lose the ability to have an overall view of what is going on.
RavenDB’s Snowflake integration is there to bridge this gap. The idea is that you can define Snowflake ETL processes that would push the data from all the branches you have to a single shared Snowflake account. Your headquarters can then run queries, analyse the data, and in general have near real-time analytics without hobbling the branches with having to manage the data remotely.
The Grocery Store Scenario
In our grocery store, we manage the store using RavenDB as the backend. With documents such as this to record sales:
These documents are repeated many times for each store, recording the movement of inventory, tracking sales, etc. Now we want to share those details with headquarters.
There are two ways to do that. One is to use the Snowflake ETL to push the data itself to the HQ’s Snowflake account. You can see an example of that when we push the raw data to Snowflake in this article.
The other way is to make use of RavenDB’s map/reduce capabilities to do some of the work in each store and only push the summary data to Snowflake. This can be done in order to reduce the load on Snowflake if you don’t need a granular level of data for analytics.
With that in place, each branch would push details about its sales, inventory discarded, etc., to the Snowflake account. And headquarters can run their queries and get a real-time view and understanding about what is going on globally.
The big-ticket item for RavenDB 7.0 may be the new vector search and AI integration, but those aren’t the only new features this new release brings to the table.
AWS SQS ETL allows you to push data directly from RavenDB into an SQS queue. You can read the full details in our documentation, but the basic idea is that you can supply your own logic that would push data from RavenDB documents to SQS for additional processing.
For example, suppose you need to send an email to the customer when an order is marked as shipped. You write the code to actually send the email as an AWS Lambda function, like so:
deflambda_handler(event, context):for record in event['Records']:
message_body = json.loads(record['body'])
email_body ="""
Subject: Your Order Has Shipped!
Dear {customer_name},
Great news! Your order #{order_id} has shipped. Here are the details:
Track your package here: {tracking_url}
""".format(
customer_name=message_body.get('customer_name'),
order_id=message_body.get('order_id'),
tracking_url=message_body.get('tracking_url'))
send_email(
message_body.get('customer_email'),
email_body
)
sqs_client.delete_message(
QueueUrl=shippedOrdersQueue,
ReceiptHandle=record['receiptHandle'])
The next step is to get the data into the queue. This is where the new AWS SQS ETL inside of RavenDB comes into play.
You can specify a script that reacts to changes inside your database and sends a message to SQS as a result. Look at the following, on the Orders collection:
Like our previous ETL processes for queues, you can also use RavenDB in the Outbox pattern to gain transactional capabilities on top of the SQS queue. You write the messages you want to reach SQS as part of your normal RavenDB transaction, and the RavenDB SQS ETL will ensure that they reach the queue if the transaction was successfully committed.
Before discussing the actual feature, I want to show you what we have done:
$query='I feel like italian today'
from Products
where vector.search(embedding.text(Name),$query)
Try that in the public instance using the sample database, here is what you get back!
I wrote parts of the vector search for RavenDB, and even so, Iwas pretty amazed when I realized that this query above just works. Note that there is no actual setup to be done here. You just issue a query and ask it to use the vector.search() during execution. RavenDB handles everything else.
You can read more about the vector search feature in our documentation, but the gist of it is that you can run a piece of data (text, image, or even a video) through a large language model and get a vector back. That vector allows you to query using the model’s understanding of the data.
The idea is that this moves you beyond querying with keywords or even full-text search. You are now able to search for meaning and intent. You can leverage models such as OpenAI, Ollama, Grok, Mistral, or DeepSeek to give your users deep insight into their data inside RavenDB.
RavenDB embeds a small model (bge-micro-v2) and can apply it during auto-indexes, both for generating embeddings for your data and for queries. As you can see, even with a tiny model, the results are spectacular.
Naturally, you can also use larger models, including OpenAI, Ollama, Grok, and more. Using a large model means it has a better contextual understanding of the relationships between the data and the query and can provide more accurate results.
Approximate neighbor search using the HNSW algorithm.
Exact neighbor search.
Support for vectors using float arrays, base64 encoded, and binary attachments.
RavenVector type for optimizing disk space and improving the read speed of vectors.
Using full vectors or providing quantized ones to reduce disk space.
Support for auto-quantization of vectors during indexing & queries.
Our aim with the new RavenDB 7.0 release is to make it possible for you to find meaning - to be able to leverage vectors, embeddings, large language models, and AI without fuss or trouble. I’m really proud to say that we have exceeded all my expectations.
There are a lot more exciting announcements about RavenDB & AI integration in the pipeline, and I’m really excited to share them with you in the near future.
There are actually other features that are part of RavenDB 7.0, but AI is eating the world, so we’ll cover them in a separate blog post instead of giving them a tertiary spot in this one.