Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,620
|
Comments: 51,248
Privacy Policy · Terms
filter by tags archive
time to read 4 min | 763 words

We are a database company, and many of our customers and users are running in the cloud. Fairly often, we field questions about the recommended deployment pattern for RavenDB.

Given the… rich landscape of DevOps options, RavenDB supports all sorts of deployment models:

  • Embedded in your application
  • Physical hardware (from a Raspberry Pi to massive servers)
  • Virtual machines in the cloud
  • Docker
  • AWS / Azure marketplaces
  • Kubernetes
  • Ansible
  • Terraform

As well as some pretty fancy permutations of the above in every shape and form.

With so many choices, the question is: what do you recommend? In particular, we were recently asked about deployment to a “naked machine” in the cloud versus using Kubernetes. The core requirements are to ensure high performance and high availability.

Our short answer is almost always: Best to go with direct VMs and skip Kubernetes for RavenDB.

While Kubernetes has revolutionized the deployment of stateless microservices, deploying stateful applications, particularly databases, on K8s introduces significant complexities that often outweigh the benefits, especially when performance and operational simplicity are paramount.

A great quote in the DevOps world is “cattle, not pets”, in reference to how you should manage your servers. That works great if you are dealing with stateless services. But when it comes to data management, your databases are cherished pets, and you should treat them as such.

The Operational Complexity of Kubernetes for Stateful Systems

Using an orchestration layer like Kubernetes complicates the operational management of persistent state. While K8s provides tools for stateful workloads, they require a deep understanding of storage classes, Persistent Volumes (PVs), and Persistent Volume Claims (PVCs).

Consider a common, simple maintenance task: Changing a VM's disk type or size.

As a VM, this is typically a very easy operation and can be done with no downtime.The process is straightforward, well-documented, and often takes minutes.

For K8s, this becomes a significantly more complex task. You have to go deep into Kubernetes storage primitives to figure out how to properly migrate the data to a new disk specification.

There is an allowVolumeExpansion: true option that should make it work, but the details matter, and for databases, that is usually something DBAs are really careful about.

Databases tend to care about their disk. So what happens if we don’t want to just change the size of the disk, but also its type? Such as moving from Standard to Premium. Doing that using VMs is as simple as changing the size. You may need to detach, change, and reattach the disk, but that is a well-trodden path.

In Kubernetes, you need to run a migration, delete the StatefulSets, make the configuration change, and reapply (crossing your fingers and hoping everything works).

Database nodes are not homogeneous

Databases running in a cluster configuration often require granular control over node upgrades and maintenance. I may want to designate a node as “this one is doing backups”, so it needs a bigger disk. Easy to do if each node is a dedicated VM, but much harder in practice inside K8s.

A recent example we ran into is controlling the upgrade process of a cluster. As any database administrator can tell you, upgrades are something you approach cautiously. RavenDB has great support for running cross-version clusters.

In other words, take a node in your cluster, upgrade that to an updated version (including across major versions!), and it will just work. That allows you to dip your toes into the waters with a single node, instead of doing a hard switch to the new version.

In a VM environment: Upgrading a single node in a RavenDB cluster is a simple, controlled process. You stop the database on the VM, perform the upgrade (often just replacing binaries), start the database, and allow the cluster to heal and synchronize. This allows you to manage the cluster's rolling upgrades with precision.

In K8s: Performing a targeted upgrade on just one node of the cluster is hard. The K8s deployment model (StatefulSets) is designed to manage homogeneous replicas. While you can use features like "on delete" update strategy, blue/green deployments, or canary releases, they add layers of abstraction and complexity that are necessary for stateless services but actively harmful for stateful systems.

Summary

For mission-critical database infrastructure where high performance, high availability, and operational simplicity are non-negotiable, the added layer of abstraction introduced by Kubernetes for managing persistence often introduces more friction than value.

While Kubernetes is an excellent platform for stateless services, we strongly recommend deploying RavenDB directly on dedicated Virtual Machines. This provides a cleaner operational surface, simpler maintenance procedures, and more direct control over the underlying resources—all critical factors for a stateful, high-performance database cluster.

Remember, your database nodes are cherished pets, don’t make them sleep in the barn with the cattle.

time to read 7 min | 1362 words

A really interesting problem for developers building agentic systems is moving away from chatting with the AI model. For example, consider the following conversation:

This is a pretty simple scenario where we need to actually step out of the chat and do something else. This seems like an obvious request, right? But it turns out to be a bit complex to build.

The reason for that is simple. AI models don’t actually behave like you would expect them to if your usage is primarily as a chat interface. Here is a typical invocation of a model in code:


class MessageTuple(NamedTuple):
    role: str
    content: str


def call_model(
    message_history: List[MessageTuple],
    tools: List[Callable] = None
):
   pass # redacted

In other words, it is the responsibility of the caller to keep track of the conversation and send the entire conversation to the agent on each round. Here is what this looks like in code:


conversation_history = [
    {
        "role": "user",
        "content": "When do I get my anniversary gift?"
    },
    {
        "role": "agent",
        "content": "Based on our records, your two-year anniversary is in three days. This milestone means you're eligible for a gift card as part of our company's recognition program.\nOur policy awards a $100 gift card for each year of service. Since you've completed two years, a $200 gift card will be sent to you via SMS on October 1, 2025."
    },
    {
        "role": "user",
        "content": "Remind me to double check I got that in a week"
    }
]

Let’s assume that we have a tool call for setting up reminders for users. In RavenDB, this looks like the screenshot below (more on agentic actions in RavenDB here):

And in the backend, we have the following code:


conversation.Handle<CreateReminderArgs>("CreateReminder", async (args) =>
{
    using var session = _documentStore.OpenAsyncSession();
    var at = DateTime.Parse(args.at);
    var reminder = new Reminder
    {
        EmployeeId = request.EmployeeId,
        ConversationId = conversation.Id,
        Message = args.msg,
    };
    await session.StoreAsync(reminder);
    session.Advanced.GetMetadataFor(reminder)["@refresh"] = at;
    await session.SaveChangesAsync();


    return $"Reminder set for {at} {reminder.Id}";
});

This code uses several of RavenDB’s features to perform its task. First we have the conversation handler, which is the backend handling for the tool call we just saw. Next we have the use of the @refresh feature of RavenDB. I recently posted about how you can use this feature for scheduling.

In short, we set up a RavenDB Subscription Task to be called when those reminders should be raised. Here is what the subscription looks like:


from Reminders as r
where r.'@metadata'.'@refresh' != null

And here is the client code to actually handle it:


async Task HandleReminder(Reminder reminder)
{
        var conversation = _documentStore.AI.Conversation(
                agentId: "smartest-agent",
                reminder.ConversationId,
                creationOptions: null
       );
     conversation.AddArtificialActionWithResponse(
"GetRaisedReminders", reminder);
     var result = await conversation.RunAsync();
     await MessageUser(conversation, result);
}

The question now is, what should we do with the reminder?

Going back to the top of this post, we know that we need to add the reminder to the conversation. The problem is that this isn’t part of the actual model of the conversation. This is neither a user prompt nor a model answer. How do we deal with this?

We use a really elegant approach here: we inject an artificial tool call into the conversation history. This makes the model think that it checked for reminders and received one in return, even though this happened outside the chat. This lets the agent respond naturally, as if the reminder were part of the ongoing conversation, preserving the full context.

Finally, since we’re not actively chatting with the user at this point, we need to send a message prompting them to check back on the conversation with the model.

Summary

This is a high-level post, meant specifically to give you some ideas about how you can take your agentic systems to a higher level than a simple chat with the model. The reminder example is a pretty straightforward example, but a truly powerful one. It transforms a simple chat into a much more complex interaction model with the AI.

RavenDB’s unique approach of "inserting" a tool call back into the conversation history effectively tells the AI model, "I've checked for reminders and found a reminder for this user." This allows the agent to handle the reminder within the context of the original conversation, rather than initiating a new one. It also allows the agent to maintain a single, coherent conversational thread with the user, even when the system needs to perform background tasks and re-engage with them later.

You can also use the same infrastructure to create a new conversation, if that makes sense in your domain, and use the previous conversation as “background material”, so to speak. There is a wide variety of options available to fit your exact scenario.

time to read 1 min | 88 words

I gave the following talk at Microsoft Ignite 2025:

Connecting LLMs to your secure, operational database involves complexity, security risks, and hallucinations. This session shows how to build context-aware AI agents directly on your existing data, going from live database to production-ready, secure AI agent in hours. You'll see how to ship personalized experiences that will define the next generation of software. RavenDB's CEO will demonstrate this approach.

time to read 1 min | 76 words

Want to see how modern applications handle complexity, scale, and cutting-edge features without becoming unmanageable? In this deep-dive webinar, we move From CRUD to AI Agents, showcasing how RavenDB, a high-performance document database, simplifies the development of a complex Property Management application.

time to read 3 min | 441 words

When building AI Agents, one of the challenges you have to deal with is the sheer amount of data that the agent may need to go through. A natural way to deal with that is not to hand the information directly to the model, but rather allow it to query for the information as it sees fit.

For example, in the case of a human resource assistant, we may want to expose the employer’s policies to the agent, so it can answer questions such as “What is the required holiday request time?”.

We can do that easily enough using the following agent-query mechanism:

If the agent needs to answer a question about a policy, it can use this tool to get the policies and find out what the answer is.

That works if you are a mom & pop shop, but what happens if you happen to be a big organization, with policies on everything from requesting time off to bringing your own device to modern slavery prohibition? Calling this tool is going to give all those policies to the model?

That is going to be incredibly expensive, since you have to burn through a lot of tokens that are simply not relevant to the problem at hand.

The next step is not to return all of the policies and filter them. We can do that using vector search and utilize the model’s understanding of the data to help us find exactly what we want.

That is much better, but a search for “confidentiality contract” will get you the Non-Disclosure Agreement as well as the processes for hiring a new employee when their current employer isn’t aware they are looking, etc.

That can still be a lot of text to go through. It isn’t as much as everything, but still a pretty heavy weight.

A nice alternative to this is to break it into two separate operations, as you can see below:

The model will first run the FindPolicies query to get the list of potential policies. It can then decide, based on their titles, which ones it is actually interested in reading the full text of.

You need to perform two tool calls in this case, but it actually ends up being both faster and cheaper in the end.

This is a surprisingly elegant solution, because it matches roughly how people think. No one is going to read a dozen books cover to cover to answer a question. We continuously narrow our scope until we find enough information to answer.

This approach gives your AI model the same capability to narrowly target the information it needs to answer the user’s query efficiently and quickly.

time to read 3 min | 445 words

When using an AI model, one of the things that you need to pay attention to is the number of tokens you send to the model. They literally cost you money, so you have to balance the amount of data you send to the model against how much of it is relevant to what you want it to do.

That is especially important when you are building generic agents, which may be assigned a bunch of different tasks. The classic example is the human resources assistant, which may be tasked with checking your vacation days balance or called upon to get the current number of overtime hours that an employee has worked this month.

Let’s assume that we want to provide the model with a bit of context. We want to give the model all the recent HR tickets by the current employee. These can range from onboarding tasks to filling out the yearly evaluation, etc.

That sounds like it can give the model a big hand in understanding the state of the employee and what they want. Of course, that assumes the user is going to ask a question related to those issues.

What if they ask about the date of the next bank holiday? If we just unconditionally fed all the data to the model preemptively, that would be:

  • Quite confusing to the model, since it will have to sift through a lot of irrelevant data.
  • Pretty expensive, since we’re going to send a lot of data (and pay for it) to the model, which then has to ignore it.
  • Compounding effect as the user & the model keep the conversation going, with all this unneeded information weighing everything down.

A nice trick that can really help is to not expose the data directly, but rather provide it to the model as a set of actions it can invoke. In other words, when defining the agent, I don’t bother providing it with all the data it needs.

Rather, I provide the model a way to access the data. Here is what this looks like in RavenDB:

The agent is provided with a bunch of queries that it can call to find out various interesting details about the current employee. The end result is that the model will invoke those queries to get just the information it wants.

The overall number of tokens that we are going to consume will be greatly reduced, while the ability of the model to actually access relevant information is enhanced. We don’t need to go through stuff we don’t care about, after all.

This approach gives you a very focused model for the task at hand, and it is easy to extend the agent with additional information-retrieval capabilities.

time to read 3 min | 504 words

Building an AI Agent in RavenDB is very much like defining a class, you define all the things that it can do, the initial prompt to the AI model, and you specify which parameters the agent requires. Like a class, you can create an instance of an AI agent by starting a new conversation with it. Each conversation is a separate instance of the agent, with different parameters, an initial user prompt, and its own history.

Here is a simple example of a non-trivial agent. For the purpose of this post, I want to focus on the parameters that we pass to the model.


var agent = new AiAgentConfiguration(
"shopping assistant", 
config.ConnectionStringName,
"You are an AI agent of an online shop...")
{
    Parameters =
    [ 
       new AiAgentParameter("lang", 
"The language the model should respond with."),
        new AiAgentParameter("currency", "Preferred currency for the user"),
        new AiAgentParameter("customerId", null, sendToModel: false),
    ],
    Queries = [ /* redacted... */ ],
    Actions = [ /* redacted... */ ],
};

As you can see in the configuration, we define the lang and currency parameters as standard agent parameters. These are defined with a description for the model and are passed to the model when we create a new conversation.

But what about the customerId parameter? It is marked as sendToModel: false. What is the point of that? To understand this, you need to know a bit more about how RavenDB deals with the model, conversations, and memory.

Each conversation with the model is recorded using a conversation document, and part of this includes the parameters you pass to the conversation when you create it. In this case, we don’t need to pass the customerId parameter to the model; it doesn’t hold any meaning for the model and would just waste tokens.

The key is that you can query based on those parameters. For example, if you want to get all the conversations for a particular customer (to show them their conversation history), you can use the following query:


from "@conversations" 
where Parameters.customerId = $customerId

This is also very useful when you have data that you genuinely don’t want to expose to the model but still want to attach to the conversation. You can set up a query that the model may call to get the most recent orders for a customer, and RavenDB will do that (using customerId) without letting the model actually see that value.

time to read 1 min | 98 words

The RavenDB team will be at Microsoft Ignite in San Francisco next week, as will be yours truly in person 🙂. We are going to show off RavenDB and its features both new and old.

I'll be hosting a session demonstrating how to build powerful AI Agents using RavenDB.I’ll show practical examples and the features that make RavenDB suitable for AI-driven applications.

If you're at Microsoft Ignite or in the San Francisco area next week, I'd like to meet up.Feel free to reach out to discuss RavenDB, AI, architecture or anything else.

time to read 1 min | 155 words

RavenDB Ltd (formerly Hibernating Rhinos) has been around for quite some time!In its current form, we've been building the RavenDB database for over 15 years now.In late 2010, we officially moved into our first real offices.

Our first place was a small second-story office space deep in the industrial section, a bit out of the way, but it served us incredibly well until we grew and needed more space.Then we grew again, and again, and again!Last month, we moved offices yet again.

This new location represents our fifth office, with each relocation necessitated by our growth exceeding the capacity of the previous premises.

If you ever pass by Hadera, where our offices now proudly reside, you'll spot a big sign as you enter the city!

You can also see how it looks like from the inside:

time to read 5 min | 849 words

I ran into this tweet from about a month ago:

dax @thdxr

programmers have a dumb chip on their shoulder that makes them try and emulate traditional engineering there is zero physical cost to iteration in software - can delete and start over, can live patch our approach should look a lot different than people who build bridges

I have to say that I would strongly disagree with this statement. Using the building example, it is obvious that moving a window in an already built house is expensive. Obviously, it is going to be cheaper to move this window during the planning phase.

The answer is that it may be cheaper, but it won’t necessarily be cheap. Let’s say that I want to move the window by 50 cm to the right. Would it be up to code? Is there any wiring that needs to be moved? Do I need to consider the placement of the air conditioning unit? What about the emergency escape? Any structural impact?

This is when we are at the blueprint stage - the equivalent of editing code on screen. And it is obvious that such changes can be really expensive. Similarly, in software, every modification demands a careful assessment of the existing system, long-term maintenance, compatibility with other components, and user expectations.This intricate balancing act is at the core of the engineering discipline.

A civil engineer designing a bridge faces tangible constraints: the physical world, regulations, budget limitations, and environmental factors like wind, weather, and earthquakes.While software designers might not grapple with physical forces, they contend with equally critical elements such as disk usage, data distribution, rules & regulations, system usability, operational procedures, and the impact of expected future changes.

Evolving an existing software system presents a substantial engineering challenge.Making significant modifications without causing the system to collapse requires careful planning and execution.The notion that one can simply "start over" or "live deploy" changes is incredibly risky.History is replete with examples of major worldwide outages stemming from seemingly simple configuration changes.A notable instance is the Google outage of June 2025, where a simple missing null check brought down significant portions of GCP. Even small alterations can have cascading and catastrophic effects.

I’m currently working on a codebase whose age is near the legal drinking age. It also has close to 1.5 million lines of code and a big team operating on it. Being able to successfully run, maintain, and extend that over time requires discipline.

In such a project, you face issues such as different versions of the software deployed in the field, backward compatibility concerns, etc. For example, I may have a better idea of how to structure the data to make a particular scenario more efficient. That would require updating the on-disk data, which is a 100% engineering challenge. We have to take into consideration physical constraints (updating a multi-TB dataset without downtime is a tough challenge).

The moment you are actually deployed, you have so many additional concerns to deal with. A good example of this may be that users are used to stuff working in a certain way. But even for software that hasn’t been deployed to production yet, the cost of change is high.

Consider the effort associated with this update to a JobApplication class:

This looks like a simple change, right? It just requires that you (partial list):

  • Set up database migration for the new shape of the data.
  • Migrate the existing data to the new format.
  • Update any indexes and queries on the position.
  • Update any endpoints and decide how to deal with backward compatibility.
  • Create a new user interface to match this whenever we create/edit/view the job application.
  • Consider any existing workflows that inherently assume that a job application is for a single position.
  • Can you be partially rejected? What is your status if you interviewed for one position but received an offer for another?
  • How does this affect the reports & dashboard?

This is a simple change, no? Just a few characters on the screen. No physical cost. But it is also a full-blown Epic Task for the project - even if we aren’t in production, have no data to migrate, or integrations to deal with.

Software engineersoperate under constraints similar to other engineers, including severe consequences for mistakes (global system failure because of a missing null check). Making changes to large, established codebases presents a significant hurdle.

The moment that you need to consider more than a single factor, whether in your code or in a bridge blueprint, there is a pretty high cost to iterations. Going back to the bridge example, the architect may have a rough idea (is it going to be a Roman-style arch bridge or a suspension bridge) and have a lot of freedom to play with various options at the start. But the moment you begin to nail things down and fill in the details, the cost of change escalates quickly.

Finally, just to be clear, I don’t think that the cost of changing software is equivalent to changing a bridge after it was built. I simply very strongly disagree that there is zero cost (or indeed, even low cost) to changing software once you are past the “rough draft” stage.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Recording (20):
    05 Dec 2025 - Build AI that understands your business
  2. Webinar (8):
    16 Sep 2025 - Building AI Agents in RavenDB
  3. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  4. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  5. RavenDB News (2):
    02 May 2025 - May 2025
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}