Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:


+972 52-548-6969

Posts: 7,105 | Comments: 49,934

filter by tags archive
time to read 2 min | 333 words

I am really proud with the level of transparency and visibility that RavenDB gives out of the box to its users. The server dashboard gives you all the critical information about what a node is doing and can serve as a great mechanism to tell at a glance what is the health of a node.

A repeated customer request is to take that up a notch. Not just showing a single server status, but showing the entire cluster state. This isn’t a replacement for a full monitoring solution, but it is meant to provide you with a quick overview of exactly what is going on in your cluster.

I want to present some early prototypes of how we are thinking about showing this information, and I wanted to get your feedback about those, as well as any other information that you think should be relevant for the cluster dashboard.

Here is the resource utilization portion of the dashboard. We aren’t settled yet on the graphs, but we’ll likely have CPU usage and memory (including memory breakdowns).


Some of this information may be hidden by default, and you can expand it:


You can get more details about what the cluster is doing here:


And finally, the overall view of task assignment in the cluster:


You can also drill down to a particular server status:


This is early stages yet, we pretty much have just the mockup, so this is the right time to ask for what you want to see there.

time to read 1 min | 165 words

Among the advantages of a highly distributed system with endless edge points are that you can outsource data collection to a universe of locations, and even include them in your workflow, thereby expanding your operations. The challenges are when you have endpoints that contribute to your organization and systems, but you don’t exactly trust. They can be newcomers that you don’t know enough about, or entities with a history of misusing the data inclusion to your systems give them access to. You want the value they create, the information they amass and gather to be copied from the edge up the levels of your system, but you don’t want to give too much for that value or pay for it in the form of greater risk. Filtered replication is the art of enabling nontrusted edge points to access your system in a limited manner, replicating the information they produce in a nontrusted format.

time to read 2 min | 221 words

On an otherwise uneventful morning, the life of the operations guy got… interesting.

What were supposed to be a routine morning got hectic because the database refused to operate normally. To be more exact, the database refused to load a file. RavenDB is generally polite when it run into issues, but this time, it wasn’t playing around. Here is the error it served:

---> System.IO.IOException: Could not set the size of file  D:\RavenData\Databases\Purple\Raven.voron to 820 GBytes

---> System.ComponentModel.Win32Exception (665): The requested operation could not be completed due to a file system limitation

Good old ERROR_FILE_SYSTEM_LIMITATION, I never knew you, because we have never run into an error with this in the past.

The underlying reason was simple, we had a large file (820GB) that was too fragmented. At some point, the number of fragments of the file bypassed the maximum size of the file system.

The KB article about this issue is here. You might be able to move forward more quickly by using the contig.exe tool to defrag a single file.

The root cause here was probably backing up to the same drive as the database, which forced the file system to break the database file into fragements.

Just a reminder that there are always more layers into the system and that we need to understand them all when they break.

time to read 2 min | 287 words

We got an interesting question a few times in recent weeks. How can I manually create a document revision with RavenDB? The answer for that is that you can use the ForceRevisionCreationFor() method to do so. Here is how you’ll typically use this API:

This is the typical expected usage for the API. We intended this to make it so users can manually triggers revisions, for example, when moving a document from draft mode to public and the like.

It turns out that there is another reason to want to use this API, when you migrate data to RavenDB and want to create historical revisions. The API we envisioned isn’t suitable for this, but the layered API in RavenDB means that we can still get the desired behavior.

Here is how we can achieve this:

Basically, we manually create the transaction steps that will run on the server, and we can apply the command to the same document multiple times.

Note that RavenDB requires a document to create a revision from it, so we set the document, create a revision and overwrite the document again, as many times as we need to.

Another issue that was brought up is that the @last-modified property on the document is set to the date of the revision creation. In some cases, they want to do this to get the revision to be created at the right time it was originally created during the migration period.

That is not supported by RavenDB, because the @last-modified is tracking the time that RavenDB modified the document or revision. If you need to track the time a document was modified in the domain, you need to keep that as part of your actual domain model.

time to read 1 min | 100 words

Alex has been writing a sample application in RavenDB and has been getting deep into the details of how to architect a non trivial system.

He recently published: Power of Dynamic fields for indexing dictionaries and collections in RavenDB – how to deal with dynamic fields, which joins the previous post:

This makes for an interesting read and walks you through the entire process. There are more in the pipeline…

time to read 2 min | 273 words

I’m very happy to announce that the TypeScript / Node.js client API for RavenDB was recently updated to 5.0. This release updates the API to support Time Series API and bulk insert. Beyond the new API and functionality, we have also put a lot of effort into the ergonomics of this release.

One of the major changes that was made was to the way you use indexes in the API. Thanks to Yawar Jamal are due, for suggesting this improvement and sending the initial PR. What does this means? Well, here is an index definition in the new version:

The actual index definition isn’t that interesting. You can see a longer explanation of exactly what I’m doing in this post. What is really interesting is that I can define this using code, no messing about with strings. This is checked by the compiler and is going to give you a similar developer experience as using Linq in .NET.

I also mentioned ergonomics, right? Let’s look at some of the other features that you now get from the client. It’s funny, because this has nothing to do with code execution, but is very important to just Getting Things Done.

Take a look at this:

image (1)

Even though we are passing a string to the query, we have intellisense to assist us and warn about typos.

That applies all over the API, so you don’t really have to make an effort, It Just Works.

image (2)

time to read 6 min | 1121 words

RavenDB is a distributed database. You can run it on a single node, in a cluster on a single data center on as a geo distributed cluster. Separately, you can also run RavenDB in a multi master configuration. In this case, you don’t have a single cluster spanning the globe, but multiple cooperating clusters working together. The question is, when should I use a geo distributed cluster and when should I setup a multi master configuration with multiple coordinating clusters?

Here is an example of a global cluster:


As you can see, we have nodes all over the world, all joined into a single cluster. In this mode, the nodes will select a leader (denoted by the crown) which will manage all the behavior of the cluster. To ensure that we can properly detect failures, we setup a timeout interval that is appropriate for the distances involved.  Note that even in this mode, most of the actual writes to RavenDB will be done on a pure node-local basis and gossiped between the nodes. You can select the appropriate level of write assurance that you want (confirm after it was written to two additional locations, for example).

Such a setup suffer from one issue, coordinating across that distance (and latencies) means that we are need to account for the inherent delay in the decision loop. For most operations, this doesn’t actually impact your system, since most of the work is done in the background and not user visible. It does mean that RavenDB is going to take longer to detect and recover from failures. In the case of the setup in the image, we are talking about the difference between less than 300ms (the default when running in a single data center) to detecting failure in around 5 seconds or so.

Because RavenDB favors availability, it usually doesn’t matter. But there are cases where it does. Any time that you have to wait for a cluster operation, you’ll feel the additional latency. This applies not just to failure detection but when everything is running smoothly. A cluster operation in the above setup will require confirmation from two additional nodes aside from the leader. Ping time in this case would be 200 – 300ms between the nodes, in most cases. That means that at best, any such operation would be completed in 750ms or so.

What operations would this apply to? Creation of new databases and indexes is done as a cluster operation, but they are rarely latency sensitive. The primary issue for this sort of setup is if you are using:

  • Cluster wide transactions
  • Compare exchange operations

In those cases, you have to account for higher latency as part of your overall deployment. Such operations are inherently more expensive. If you are running in a single data center, with ping times that are usually < 1 ms, that is not very noticeable. When you are running in a geo distributed environment, that matters a lot more.

One consideration that we haven’t yet taken into account is what happens during failure. Let’s assume that I have a web application deployed in Brazil, which is aimed at the local RavenDB instance. If the Brazilian’s RavenDB instance decide to visit the carnival and not respond, what is going to happen? On the one hand, the other nodes in the cluster will simply pick up the slack. But for the web application in Brazil, that means that instead of using the local instance, we need to go wide to reach the alternative nodes. GitHub had a similar issue, but between east and west coast and the additional latency inherent in such a setup took them down.

To be honest, beyond the additional latency that you have with cluster wide operations in such a setup, I think that this is the biggest disadvantage of such a system. You can avoid that by running multiple nodes in each location, all joined into a big cluster, of course. Then you can set things up that each client will use the nearest nodes. That gives you local failover, but you still need to consider how to handle total outage in one location.

There is another alternative, in which case you have separate clusters in each location (may be a single instance, but I’m showing a cluster there because you’ll want local high availability). Instead of having a single cluster, we set things up so there are multiple such clusters. Then we use RavenDB’s multi-master capabilities to tie them all together.


In this setup, the different clusters will gossip between themselves about the data, but that is the only thing that is truly shared. Each cluster will manage its own nodes and failover and work is done only on the local cluster level, not globally.

Other things, indexes, ETL, subscriptions etc are all defined at the cluster level, and you’ll need to consider whatever you’ll have them in each cluster (likely for indexes, for example), or only on a single location. Something like ETL would probably have a designated location that would be pushing the data to its destination, rather than duplicated in each local cluster.

The most interesting question, however, is how do you handle cluster wide transactions or compare exchange operation in such an environment?

A cluster wide transaction is… well, cluster wide. That means that if you have multiple clusters, your consistency scope is only within a single one. That is probably the major limit for breaking apart the cluster in such a system.

There are ways to handle that, of course. You can split your compare exchanges so they will have a particular cluster that will own them, in this manner, you can direct certain operations to a particular cluster regardless of where the operation originated. In many environments, this is already something that will naturally happen. For example, if you are running in such an environment to deal with local clients, it is natural to hold their compare exchange values for the cluster they are using, even if the data is globally replicated.

Another factor to consider is that RavenDB replicates documents  and their data, but compare exchange values aren’t considered in this case. They are global to the cluster, and as such aren’t sent via replication.

I’m afraid that I don’t have one answer to the question how to geo distribute your RavenDB based system. You need to account for several factors about your application, your needs and the system as a whole. But I hope that you’ll now have the appropriate background to make an informed decision.


  1. Looking at Parler specs and their architecture - 6 hours from now

There are posts all the way to Jan 21, 2021


  1. Webinar recording (12):
    15 Jan 2021 - Filtered Replication in RavenDB
  2. Production postmortem (30):
    07 Jan 2021 - The file system limitation
  3. Open Source & Money (2):
    19 Nov 2020 - Part II
  4. re (27):
    27 Oct 2020 - Investigating query performance issue in RavenDB
  5. Reminder (10):
    25 Oct 2020 - Online RavenDB In Action Workshop tomorrow via NDC
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats