Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:


+972 52-548-6969

Posts: 7,252 | Comments: 50,429

Privacy Policy Terms
filter by tags archive
time to read 4 min | 780 words

imageIn architecture (physical building) there is a term called Desire Lanes. The idea is that users will take the path of least resistance, regardless of the intention of the architect. The image on the right is one that I have seen many times, and I got a chuckle out of that each time. That is certainly something that I have seen over and over again.

I had the chance recently to see how the exact same thing happens in two very different software systems. There was a need and the system didn’t allow it. The users found a way.

In the first instance, we are talking about a high security environment. The kind where you leave your phones and smart watches at the door, outside devices are absolutely prohibited. So far, this makes sense and there is a real need for that for their scenario.

The problem is that they also have a high degree of people who are working on that environment on a very transient basis. You may get people that show up for a day or two mostly (meeting, briefings, training) or for a couple of weeks at most. Those people need to be able to do… stuff with computers (take notes, present, plan, etc).

Given the high security environment, creating a user in the system takes a few days at least (involves security briefing, guidance, etc). Note that all involved have the right security clearances, that isn’t the issue. But before you can get a user account, policy dictates that you need to be briefed, login is done via smart cards + password only, etc. You can’t make that work if you have hundreds of people coming and going all the time.

The solution? There are a bunch of smart cards in a drawer belonging to former employees whose accounts were purposefully not deactivated. You get handed the card + password and can use the account for basic needs.

I assume that those accounts are locked, but I didn’t bother to verify that. It wouldn’t surprise me if they still had all their permissions and privileges.

From an IT security standpoint, I am horrified. That is a Bad Idea, but it is a solution to the issue at hand, providing computer access for short terms visitors without having to go through all the hoops the security policy dictates.

This is sadly a very widespread tactic in that organization, I have seen this in multiple branches in separate locations.

In the second scenario, there is a system to reserve appointments with doctors. The system has an app, where users can register for their appointments themselves. There is also the administration team that may also reserve appointments for patients. The system allows a doctor to define their hours of operations and then (as far as the system is concerned) it is first come & first served basis. The administration team, on the other hand, has to deal with a more complex situation.

For example, a common issue that I run into is that you can only set an appointment if you are registered in the system. What would you do with first time visitors? They are routinely setting things up through the administration team, but while they can reserve an appointment, they have to put someone that is registered in the system.

The solution for that is to use other people, typically the administration team will use their own accounts and set the appointment for themselves, to reserve the spot for the new patient. That can lead to some issues, for example, if the doctor has to cancel, the system will send notices to the scheduled patients. But the scheduled patient and the real patient are distinct. It also means that from a medical file perspective, certain people are “ditching” a lot of appointments.

In both cases, we can see that there is a need to do something that the system doesn’t allow (or actively trying to prevent). The end result is a solution, a sub optimal one, for sure, but something that works.

One of the key aspects for building proper systems for the long term is to listen and implement proper solutions for those sort of issues. In many cases, they are of pivotal importance for the end user, note that this is very much distinct from the customer. The customer is the one who pays, the end user is the one who is using the system. There is often a major disconnect between the two.

This is where you get Desire Features and workaround that become Official, to the point where some of those solutions are literally in the employee handbook.

time to read 2 min | 260 words

For many business domains, it is common to need to deal with hierarchies or graphs. The organization chart is one such common scenario, as is the family tree. It is common to want to use graph queries to deal with such scenarios, but I find that it is usually much easier to explicitly build your own queries..

Consider the following query, which will give me the entire hierarchy for a particular employee:

I can define my own logic for traversing from the document to the related documents, and I can do whatever I want there. You can also see that I’m including the related documents, here is how this looks like when I execute the query:


A single query gives me all the details I need to show the user with one roundtrip to the server.

Let’s go with a more complex example. The above scenario had a single path to follow, which is trivial. What happens if I have a more complex system, such as a family tree? I took the Games of Thrones data (easiest to work with for this demo) and threw that into RavenDB, and then executed the following query:

And that gives me the following output:


This is a pretty fun technique to explore, because you can run any arbitrary logic you need, and expressing things in an imperative manner is typically much more straightforward.

time to read 2 min | 303 words

RavenDB has the notion of Custom Sorters, basically, we allow you to inject your own logic into the sorting process. That allows you to run any complex logic you have around sorting.  There are rarely good reasons to want to use that. A good use case for that is when you need to sort by an external value that mutates outside of your control. Let’s say that you have invoices in multiple currencies. You want to sort them by their value in USD. The catch, you need them sorted on the current exchange rate. For that reason, you can use the custom sorter that would use the current value of the currency as the sorting mechanism.

I should point out that from a business perspective, you’ll typically want to use the value that you had for the order at the time the order was made, but that is not related the the custom sorting feature.

Let’s take another example, however. Consider the following Enum:

We want to sort by the education level of our candidates, but by default, we’ll be sorting using the textual value of the field. That isn’t what we want. We can define a custom sorter for that, but there is a far better option, just tell us what the order should be in the index.

Here is a good example:

What we are doing here is simple, we translate the textual value to a numeric one. When we query the index, we can filter by the textual value and sort by the sort value, giving us what we want. This is far simpler and more robust. If you need to add additional values down the line, it is obvious where they need to go. A custom sorter, on the other hand, is far more capable, but also more complex to operate.

time to read 2 min | 306 words

RavenDB is a database, not a queue or a service bus. That said, you can make use of RavenDB subscriptions to get a very similar behavior to a service bus. Let’s see how much effort it will take us to implement backend processing using RavenDB only.

We assume that we have commands or messages, that are written to the Commands collection and are handled via a subscription (which may have multiple concurrent workers). In terms of your messaging models, we have:

The CommandBase we have here defines the following infrastructure properties:

  • Status – enum [Initial, Processing, Failed, Completed] – default value is Initial
  • RetriesCount – int – default value is 3
  • Error – string – null by default

We can now define our subscription using the following query:

from Commands as c
where c.RetriesCount > 0  and c.Status != 'Completed' and c.’@metadata’.’@refresh’ == null

This query is pretty simple, but it allows me to get all the documents that haven’t exceeded their retry count. The @refresh option allows me to register a command to be executed at a later point in time. See the documentation here, this is a feature that exists specifically to allow you to schedule commands with subscriptions.

In my subscription workers, I can now execute:

The code above is sufficient to get most of the way toward a robust message handling system.

I can easily see what messages are being processing, I can see how long they take, etc. I can see what failed and why. And I can see the history of commands.

That handles scenarios such as error handling and retries, introspection on the state of the system and you can derive from here all the relevant numbers on throughput, capacity, etc.

It isn’t a complete solution, but for very little code, you can take this quite a long way.

time to read 2 min | 313 words

Enabling RavenDB’s revisions allows you to ask RavenDB to keep immutable copies of a document. We originally envisioned this feature as a way to have easy audit trails and a time travel feature. Revisions were meant to be something that you’ll typically access as the administrator, not something that we expected to be used in normal course of events.

Usage in the field showed that users often want to make revisions a core part of the domain model. We have a user that uses revisions to mark the Approved (and thus, locked) version of a Plan document, for example. Another example is a payroll processing system where the Contract for a particular employee isn’t pointing to a document, but to a specific revision of that contract. Modifications to the contract have no impact on the employee unless they are explicitly moved to a new version of the contract.

Seeing all those use cases popping up for revisions, we added more features around revisions to make it easier to work with them (such as allowing to explicitly create a revision on command or enforcing revisions policies after the fact).

In RavenDB 5.3 we have added the ability to include revisions as part of the query, so you get the same benefit of reduction in the number of remote calls as you get when working with document references. Let’s say that I want to calculate payroll for a set of employees, here is how I can do that (the Contract property contains the change vector of the specific version of the contract):


In terms of API, this is how you use it:

This smooths out a scenario where you had to deal with making multiple remote calls into a single one for the whole process. Just the kind of improvement that I like to see.

time to read 1 min | 150 words

JSON Patch is a feature that allows the frontend to send a set of changes on documents. If you are working with complex documents, that can result in a significant reduction in bandwidth. There are many scenarios where the client can modify a document on the browser, then produce the JSON Patch to make the server match the changes.

In RavenDB 5.3, we added direct support for implementing JSON Patch inside of RavenDB. The frontend code can forward the patch operations directly to RavenDB, where they will be executed by the database. Concurrent patches on the same document aren’t going to contend with one another and will be processed correctly. For example, in this post I’m using patch scripts to modify a document. I can do the same using JSON patch as well.

Here is how you can use the new ability:

Small improvement, but can make some scenarios much easier.

time to read 3 min | 424 words

I like to think about myself as a database guy. My go to joke about building user interfaces is that a <table> is all I need for layout (it’s not a joke). About a decade ago I just gave up on trying to follow what is going on in the frontend land and accepted that I’ll reside in the backend from here on after.

Being ignorant of the ways you’ll write a modern frontend doesn’t affect the fact that I like to use a good user interface. I have seriously mixed feelings about the importance of RavenDB Studio to the project. On the one hand, I care that it is easy to use, obvious and functional. I love that it is beautiful and will generally make your life easier. And at the same time, I abhor the fact that it has such an impact on people’s decisions. I mean, the backend of RavenDB is absolutely beautiful, from a technical perspective. But everyone always talk about the studio.

Leaving aside my mini rant, we spend quite a lot of time and effort on the studio and the User Experience in general. This release is not an exception and we have a couple of major new updates to the studio.

One of the most common things you’ll do in the studio is run queries. In this release we have done a complete revamp of the automatic code completion for the client-side RQL queries written in the studio.
The new code assistance is available when writing any query in the Query view, Patch view, and in the Subscription Query. That was actually quite interesting, from a computer science perspective. We have formal grammar for RQL now, for example, which means that we can provide much better experience for query editing. For example, take a look:


Full code completion assistance and better error handling directly at the studio makes it easier to work with RavenDB for both developers and operations.

The second feature is the Identities page:


Identities has been a feature in RavenDB for a long time, and somehow they have never been front and center. Maybe the discoverability of the feature suffered? You can now create, edit and modify the identities directly in the studio, not just through the API.


time to read 2 min | 378 words

mono-010-presentation-compressedMost of the time, you’ll communicate with RavenDB using HTTP, making REST calls. When you are doing that, you can take advantage of request compression. If the client indicates that it is able to by sending a Content-Encoding: gzip, RavenDB will send the data to you compressed. Given that we are working with JSON texts, which compress very well, we are looking at pretty significant savings in network bandwidth. This has been the case for RavenDB for many years (I didn’t check, but at least a decade, I believe).

There are certain cases, however, where RavenDB will use a binary protocol instead of HTTP. Those are usually scenarios where we are communicating directly with another RavenDB instance. All internal communications between RavenDB nodes will use direct TCP connections and when using Subscriptions, the client will open a TCP connection for the server and use that on a long term basis.

One of the fallacies of distributed computing is that bandwidth is infinite. One of the realities of cloud computing, on the other hand, is that you are paying for bandwidth. Even when you are running inside the same cloud region, cross availability zone network traffic is still charged. As you can imagine, on active systems, you may notice that you are spending a lot of bandwidth on inter cluster communication.

With RavenDB 5.3, we have added compression support for the replication and subscription connections . That means that replication and subscriptions will default for compressing the data. We are using the Zstd algorithm. In our tests, it produced both a higher compression ratio and faster performance than GZip. You don’t have to do anything for this to work (although there is a configuration option "Server.Tcp.Compression.Disable" to disable that if you really want to). When you upgrade to RavenDB 5.3, the cluster will automatically start compressing all traffic.

In our tests, we are seeing 85% (!) reduction in the amount of network traffic that we send out. That is something that I’m very much looking to seeing in our metrics once this is rolled out completely.

This is a RavenDB 5.3 feature (expected mid November) and will be available in the Professional and Enterprise editions of RavenDB.

time to read 3 min | 499 words

imageRavenDB is an OLTP database, it is meant to be the backend of business applications. There are some features in RavenDB that are meant for reporting purposes, but that is quite explicitly not our main focus. That is part of why RavenDB has such good integration with the rest of the environment, to give you the ability to use the best tool for the job.

With RavenDB 5.3, we are now allowing you to integrate directly with Power BI, so you can pull data from RavenDB to Power BI, write reports and in general utilize the full power of Power BI with your RavenDB data.

The image on the right, for example, is a report generated inside of Power BI on top of the sample data from RavenDB.

As you can imagine, I’m particularly stoked about this feature. Not only does it make reporting integration with RavenDB a lot simpler, the way that we do it is quite interesting. Instead of diving to the technical details, it would probably be more fun to show you how it works, from the perspective of Power BI.

The first thing we need to do is connect Power BI to RavenDB, using your existing Power BI system, you can simple click on Get Data and select:


You’ll then need to provide the connection details:


You’ll then be presented with the following dialog:


As you can see, we are translating the JSON documents inside of RavenDB to a columnar format for ease of processing inside of Power BI.

You can even take this further and issue RQL queries directly from inside of Power BI and transform the data. That means that you can utilize indexes, map/reduce operations, etc. Take a look:


And the result inside of Power BI:


As you can imagine, this is going to be a powerful tool in how you can work with your RavenDB data. You can also take this further and integrate that with Power BI on Azure, of course.

The way this works behind the scene is that we can now understand PostgreSQL wire protocol. That means that we can now be accessed from anywhere that can connect to Postgres. While the PostgreSQL protocol implementation is still marked as experimental, we have put the Power BI integration through its paces and we consider that stable enough for regular use.

Happy reporting Smile.

This feature is part of the RavenDB 5.3 release (expected in mid November) and is available in the Enterprise edition of RavenDB.


  1. An optimization story:–27% runtime costs for 8 lines of code - 16 hours from now
  2. Cumulative computation with RavenDB queries - about one day from now
  3. Feature Design: ETL for Queues in RavenDB - 3 days from now
  4. re: Why IndexedDB is slow and what to use instead - 4 days from now
  5. Implementing a file pager in Zig: What do we need? - 5 days from now

And 8 more posts are pending...

There are posts all the way to Dec 22, 2021


  1. Challenge (63):
    03 Nov 2021 - The code review bug that gives me nightmares–The fix
  2. Talk (6):
    23 Apr 2020 - Advanced indexing with RavenDB
  3. Production postmortem (32):
    17 Sep 2021 - The Guinness record for page faults & high CPU
  4. re (29):
    23 Jun 2021 - The performance regression odyssey
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats