Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:


+972 52-548-6969

, @ Q c

Posts: 6,481 | Comments: 47,769

filter by tags archive

Security analysis on error reporting

time to read 5 min | 973 words

When talking about the security errors we generate in RavenDB 4.0, we got some really good comments, which are worth discussing.  The following are some of the interesting tidbits from the comments there.

  • can this behavior help some malevolent user trying to hack the system?
  • at least make it an admin setting so by default it just gives a 403 and you have to turn on the detailed error reporting.
  • I'm not questioning the technical decision, but simply the fact to returning info to the user. simply logging this error server-side, or activate this behavior to a specific admin setting as Paul suggests, looks more "safe" to me.
  • often when you log to an account with wrong user or password, error message don't specify of password or username is wrong. just an example.

Let me start from scratch and outline the problem. We are using a x509 certificate to authenticate against the server. The user may use the wrong / expired / invalid certificate, the user may use no certificate or the user may use a valid certificate but attempt to do something that they don’t have access to.

By default, we can just reject the connection attempt, which will result in the following error:


We consider such an error utterly unacceptable. So we have to accept the connection, figure out what the higher level protocol is (HTTP, usually, but sometimes can be our own internal TCP connection) and send an error that the user can understand. In particular, we send a 403 HTTP error back with a message saying what the error is.

The worry is that there is some information disclosure inherent in the error message. Let us analyze that:

  • a request without a certificate, we error and announce that they didn’t use a certificate, and that one is required. There is no information here that the user is not already in possession of. They may not be aware of it, but they are in possession of the fact that they aren’t using a certificate.
  • a request with an expired certificate, we error and let the user know that. The user already have the certificate, and therefor can already figure out that it is expired. We don’t disclose any new information here, except that the user may try to use this to figure out what the server time is. This is generally not sensitive information and it can be safe to assume that it is close to what the rest of the world believe to be the current time, so I don’t see anything here that should worry us.
  • a request with a certificate that the server doesn’t know about. This result in an error saying that the server doesn’t know about the certificate. Here we get into actual information disclosure issue. We let the user know that the certificate isn’t known. But what can they gain from this?

In the case of username / password, we always say that the username or password is incorrect, we shouldn’t let the user know if a username already exists and that just the password is wrong. This is because it provide the user with information that it didn’t have (that the username is correct). Now, in practical terms, that is almost never the case, since you have password reset links and they tend to tell you if the email / username you want to reset the password to is valid.

However, with a certificate, there aren’t two pieces of information here, there is just one, so we don’t provide any additional information.

A request with a certificate to do an operation that the certificate isn’t authorized to do. This will result in one of the following errors:

  • image
  • image

There are a few things to note here. First, we don’t disclose any information beyond what the user has already provided us. The operation and the database are both provided from the user, and we use the FriendlyName so the user can tell what the certificate in question was.

Note that this check run before we check whatever the database actually exists on the server, so there isn’t any information disclosed about the existence of databases either.

Given that the user tried to perform an operation that they are not allowed to, we need to reject that operation (honey pot behavior is a totally separate issue and probably shouldn’t be part of any sane API design). Given that we reject the operation, we need to provide clear and concise errors for the user, rather than the CONNECTION REFUSED error. Given the kind of errors we generate, I believe that they provide sufficient information for the user to figure out what is going on.

As to whatever we should log this / hide this behind a configuration setting. That is really confusing to me. We are already logging rejected connections, because the admin may be interested in reviewing this. But requiring to call the admin and look at some obscure log file is a bad design in terms of usability. The same is true for hiding behind a configuration option. Either this is a secured solution, and we can report these errors, or we have to put the system into a known unsecured state (in production) just to be able to debug a connection issue. I would be far more afraid from that scenario, especially since that would be the very first thing that an admin would do, any time that there is a connection issue.

So this is the default (and only) behavior, because we have verified that this is both a secured solution and a sane one.

RavenDB 4.0Interlocked distributed operations

time to read 3 min | 440 words

imageWe couldn’t make unique constraints work in RavenDB 4.0 in a way that made sense for distributed operations, there were just too many hurdles at that level of abstractions. The problem, in essence, boils down to having to do an atomic operation in a distributed environment.  When we need to do this in an multi threaded environment, we can rely on interlocked operations to help us. In the same manner, RavenDB offers the notion of interlocked distributed operations.

Let us take a look at how this looks like, shall we?


The output of this code would be:

Success: True, Val: users/1, Index: 13

In other words, we were able to atomically set the value of “ayende” to “users/1”. At the most basic level, this gives us the ability to create unique constraints, because we are able to reserve values for particular documents, and it gives us a much more explicit manner in which to do so. If a user wants to change their username, we first try to grab the new name, change the username and then release the old username. And we have the ability to recover if there are errors midway.

This design is modeled after the Interlocked operations, and is meant to be used in the same manner. You submit such an operation to the cluster, which will run this through the Raft state machine. This means that you can rely on RavenDB’s own distributed state machine to get reliable distributed operations, including the full consistency that this provides. The key arguments here is that the name of the value and the index used. If the index you provide match the index on the state machine, we’ll replace the value in an atomic fashion. Otherwise, the operation will fail and you’ll get the current value and index so you can decide if you want to try again.

The example with the unique constraints is just the tip of the iceberg. You can also use this for your own use for things like distributed locking by registering an owner for a particular lock key and ensuring that everyone who needs the lock will race to acquire it. Note that the value that we use here can be anything, including complex objects. In other words, you can do things like set a lock descriptor that would include timeout information, owner, etc.

The interface here is pretty small, in addition to PutCompareExchangeValueOperation there is also just GetCompareExchangeValueOperation, but it is enough for you to be able to lean on RavenDB’s distributed state machine in your own application.

RavenDB 4.0Node.JS client is now in beta

time to read 1 min | 164 words

imageI’m happy to announce that the RavenDB node.js client is now publicly available in beta. Following our Python client (and obviously the .NET one), this is the newest client for RavenDB on the block, with additional clients for the JVM, Go and Ruby quickly reaching critical stage.

Here is some code using it (I’m using async/await here, but the RavenDB node.js client supports any Node 6.0 or higher):


And here is how you do some basic CRUD:


A full sample app can be found here:

You can just get the code and then run:

npm run serve

And you’ll get a running server running against the public live instance.

Unique constraints didn’t make the cut for RavenDB 4.0

time to read 3 min | 504 words

Unique Constraints is a bundle in RavenDB 3.x that was allowed you to… well, define unique constraints.  Here is the classic example:


It was always somewhat awkward to use (you had to mess around with configuration on both the server and the client side, but it worked pretty well. As long as you were running on a single server.

With RavenDB 4.0, we put a lot more emphasis on running in a cluster, and when the time came to discuss how we are going to handle unique constraints in 4.0 it was very obvious that this is going to be neither easy nor performant.

The key here is that in a distributed database using multi masters, there is no real good way to provide a unique constraint. Imagine the following database topology:


Now, what will happen if we’ll create the two different User documents with the same username in node C? It is easy for node C to be able to detect and reject this, right?

But what happen if we create one document in node B and another in node A? This is now a much harder problem to deal with. And this is without even getting into the harder aspects of how to deal with failure conditions.

The main problem here is that it is very likely that at the point we’ve discovered that we have a duplicate, there were already actions taken based on that information, which is generally not a good thing.

As a result, we were left in somewhat of a lurch. We didn’t want to have  feature that would work only on a single node, or contain a hidden broken invariant. The way to handle this properly in a distributed environment is to use some form of consensus algorithm. And what do you know, we actually have a consensus algorithm implementation at hand, the Raft protocol that is used to manage the RavenDB cluster.

That, however, led to a different problem. The process of using a unique constraint would now broken into two distinct parts. First, we would verify that the value is indeed unique, then we would save the document. This can lead to issues if there is a failure just between these two operations, and it puts a heavy burden on the system to always check the unique constraint across the cluster on every update.

The interesting thing about unique constraints is that they rarely change once created. And if they do, they are typically part of very explicit workflow. That isn’t something that is easy to handle without a lot of context. Therefor, we decided that we can’t reliably implement them and dropped the feature.

However… reliable and atomic distributed operations are now on the table, and they allow you to achieve the same thing, and usually in a far better manner. The full details will be in the next post.

reEntity Framework Core performance tuning–Part III

time to read 1 min | 166 words

I mentioned in the previous post that I’ll take the opportunity to show of some interesting queries. The application itself is here, and you can see how to UI look in the following screenshot:


I decided to see what would be the best way to come up with the information we need for this kind of query. Here is what I got.


This is using a select object style to get a complex projection back from the server. Here are the results:


As you can see, we are able to get all the data we want, in a format that is well suited to just sending directly to the UI with very little work and with tremendous speed.

reDifferent I/O Access Methods for Linux

time to read 6 min | 1063 words

The “Different I/O Access Methods for Linux, What We Chose for Scylla, and Why” is quite fascinating. It is a pleasure to be able to read in depth into another database implementation strategy and design decisions. In particular where they don’t match what we are doing, because we can learn from the differences.

The article is good both in terms of discussing the general I/O approaches on Linux in general and the design decisions and implications for ScyllaDB in particular.

ScyllaDB chose to use AIO/DIO (async direct I/O) using a dedicated library called Seastar. And they are effectively managing their own memory and I/O scheduling internally, skipping the kernel entirely. Given how important I/O is for a database, I can say that I strongly resonate with this approach, and it is something that we have tried for a while with RavenDB. That included paying careful attention to how we are sending I/O, controlling our own caching, etc.

We are in a different position from Scylla, of course, since we are using managed code, which introduced a different set of problems (and benefits), but during the design of RavenDB 4.0, we chose to go in a very different direction.

But first, let me show you the single statement in the post that caused me to write this blog post:

The great advantage of letting the kernel control caching is that great effort has been invested by the kernel developers over many decades into tuning the algorithms used by the cache. Those algorithms are used by thousands of different applications and are generally effective. The disadvantage, however, is that these algorithms are general-purpose and not tuned to the application. The kernel must guess how the application will behave next, and even if the application knows differently, it usually has no way to help the kernel guess correctly.

This statement is absolutely correct. The kernel caching algorithms (and the kernel behavior in general) is tailored to suit a generic set of requirements, which means that if you deviate from the way the kernel expects, it is going to be during extra work and you can experience significant problems (performance and otherwise).

Another great resource that I want to point you to is the following article: “You’re Doing It Wrong” which had a major effect on the way RavenDB 4.0 is designed.

What we did, basically, is to look at how the kernel is doing things, and then see how we can fit our own behavior to what the kernel is expecting. Actually, I’m lying here, because there is no one kernel here. RavenDB runs on Windows, Linux and OSX (Darwin kernel). So we have three very different systems with wildly different optimizations that we need to run optimally on. Actually, to be fair, we consider Windows & Linux as the main targets for deployment, but that still give us very different stacks to work on.

The key here was to be predictable in all things and be sure that whatever operations we make, the kernel can successfully predict them. This can be a lot of work, but not something that you’ll usually see in the code. It involves laying out the data so it is nearby in the file and ensuring that we have hotspots that the kernel can recognize and optimize for us, etc. And it involves a lot of work with the guts of the system to make sure that we match what the kernel expects.

For example, consider this statement from the article:

…application-level caching allows us to cache not only the data read from disk but also the work that went into merging data from multiple files into a single cache item.

This is a good example of the different behavior. ScyllaDB is using LSM model, which means that in order to read data, they typically need to lookup in multiple files. RavenDB uses a different model (B+Tree with MVCC) which typically means that store all the data in a single file. Furthermore, the way we store the information, we can access it directly via memory map without doing any work to prepare it ahead of time. That means that we can lean entirely on the page cache and gain all the benefits thereof.

The ScyllaDB approach also limits them to running only on Linux and only on specific configurations. Because they rely on async direct I/O, which is a… fiddly beast in Linux at the best of times, you need to make sure that everything matches just so in order to be able to get it working. Just running this on a stock Ubuntu won’t work, since ext4 will block for many operations. Another problem in my view is that this assumes that they are the only player on the machine. If you need to run with additional software on the machine, that can cause fights over resources. For production, this is less of a problem, but for running on a developer machine, that is frequently something that you need to take into account. The kernel will already do that for you (which is useful even in production when people put SQL Server & RavenDB on the same box) so you don’t have to worry about it too much. I’m not sure that this concern is even valid for ScyllaDB, since they tend to be deployed in clusters of dedicated machines (or at least Docker instances) so they have better control over the environment. That certainly make it easier if you can dictate such things.

Another consideration for the RavenDB approach is that we want to be, as much as possible, friendly to the administrator. When we lean on what the kernel does, we usually get that for free. The administrator can usually dictate policies to the kernel and have it follow them, and good sys admins both know how and know when to do that. On the other hand, if we wrote it all ourselves, we would also need to provide the hooks to modify the behavior (and monitor it, and train users in how it works, etc).

Finally, it is not a small thing to remember that if you let the kernel cache your data, that means that that cache is still around if you restart the database (but not the machine), which means that your mostly alleviate the issue of slow cold start if you needed to do things like update configuration or the database binaries.

reEntity Framework Core performance tuning–Part II

time to read 4 min | 701 words

After looking at this post detailing how to optimize data queries in EF Core, I obviously decided that I need to test how RavenDB handles the same load.

To make things fair, I tested this on my laptop, running on battery mode. The size of the data isn’t that much, only 100,000 books and half a million reviews, so I decided to increase that by an order of magnitude.


The actual queries we make from the application are pretty simple and static. We can sort by votes / publication date / price (ascending /descending) and we can filter by number of votes and the publication year.


This means that we don’t have an explosion of querying options, so that simplify the kind of work we are doing. To make things simple for myself, I kept the same model of books / authors and reviews as separate collections. This isn’t the best model for document database, but it allows us to compare apples to apples against the work the EF Core based solution and the RavenDB solution need to do.

A major cost in Jon’s solution is the need to aggregate the reviews for a book (so the average for the review can be computed). In the end, the only way to get the solution required was to just manually calculate the average reviews for each book and store the computation in the book. We’ll discuss this a bit more in a few minutes, for now, I want to turn our eyes toward the simplest possible query in this page, getting 100 books sorted by the book id.

Because we aren’t running on the same machine, it is hard to make direct parallels, but on Jon’s machine he got 80 ms for this kind of query on 100,000 books. When increasing the data to half a million  books, the query time rose to 150ms. Running the same query gives us the results instantly (zero ms). Querying and sorting by the title, for example, give us the results in 19 ms for a page size of 100 books.

Now, let us look at the major complexity for this system, sorting and filtering by the number of votes in the system. This is hard because the reviews are stored separately from the books. With EF Core, there is the need to join between the tables, which is quite expensive and eventually led Jon to take upon himself the task of manually maintaining the values. With RavenDB, we can use a map/reduce index to handle this all for us. More specifically, we are going to use a multi map/reduce index.

Here is what the index definition looks like:


We map the results from both the Books and the BookReviews into the same shape, and then reduce them together into the final output, which contains the relevant aggregation.

Now, let us do some queries, shall we? Here is us querying over the entire dataset (an order of magnitude higher than the EF Core sample set), filtering by the published date and ordering by the computed votes average. In here, we get the first 100 items, and you can see that we got over 289,753 total results:


One very interesting feature of this query is that we are asking to include the book document for the results. This is handled after the query (so no need to do a join to the entire 289K+ results), and we are able to get everything we want in a very simple fashion.

Oh, and the total time? 17 ms. Compared to the 80ms result for EF with 1/10 of the data size. That is pretty nice (and yes, different machines, hard to compare, etc).

I’ll probably have another post on this topic, showing off some of the cool things that you can do with RavenDB and queries.

RavenDB 4.0 Unsung heroesThe design of the security error flow

time to read 2 min | 351 words

recipe-575434_640This is again a feature that very few people will even notice exist, but a lot of time, effort and thinking went into building. How should RavenDB handle a case when a user make a request that it is not authorize to make. In particular, we need to consider the case of a user pointing the browser to a server or database that they aren’t authorized to see or without having the x509 certificate properly registered.

To understand the problem we need to figure out what the default experience will be like, and if we require a client certificate to connect to RavenDB, and the client does not provide it, by default the response is some variation of just closing the TCP connection. That result in the client getting an error that looks like this:

TCP connection closed unexpectedly

That is not conductive for a good error experience and will typically cause a user to spend a lot of time trying to figure out what the network problem is, while everything is working just fine, the server just doesn’t want to talk to the user.

The problem is that at the TLS level, there isn’t really a good way to give back some meaningful error. We are too low level, all we can do is just terminate the connection.

Instead of doing that, RavenDB will accept the connection, regardless of whatever it has a valid certificate (or even any certificate) and pass the connection to one level up in the chain. At that point, we can check whatever the certificate is valid and if it isn’t (or if it doesn’t have the permissions to do what we want it to do we can use the protocol’s own mechanism to report errors.

With HTTP, that means we can return a 403 error to the user, including an explanation on why we rejected the connection (no certificate, expired certificate, certificate doesn’t have the right permissions, etc). This make things much easier when you need to troubleshoot permissions issues.

reEntity Framework Core performance tuning–part I

time to read 3 min | 501 words

I run into a really interesting article about performance optimizations with EF Core and I thought that it deserve a second & third look. You might have noticed that I have been putting a lot of emphasis on performance and I had literally spent years on optimizing relational database access patterns, including building a profiler dedicated for inspecting what an OR/M is doing. I got the source and run the application.

I have a small bet with myself, saying that in any application using a relational database, I’ll be able to find a SELECT N+1 issue within one hour. So far, I think that my rate is 92% or so. In this case, I found the SELECT N+1 issue on the very first page load.


Matching this to the code, we have:


Which leads to:


And here we can already tell that there is a problem, we aren’t accessing the authors. This actually happens here:


So we have the view that is generating 10 out of 12 queries. And the more results per page you have, the more this costs.

But this is easily fixed once you know what you are looking at. Let us look at something else, the actual root query, it looks like this:


Yes, I too needed a minute to recover from this. We have:

  1. One JOIN
  2. Two correlated sub queries

Jon was able to optimize his code by 660ms to 80ms, which is pretty awesome. But that is all by making modifications to the access pattern in the database.

Given what I do for a living, I’m more interested in what it does inside the database, and here is what the query plan tells us:


There are only a few tens of thousands of records and the query is basically a bunch of index seeks and nested loop joins. But note that the way the query is structured forces the database to evaluate all possible results, then filter just the top few. That means that you have to wait until the entire result set has been processed, and as the size of your data grows, so will the cost of this query.

I don’t think that there is much that can be done here, given the relational nature of the data access ( no worries, I’m intending to write another post in this series, you guess what I’m going to write there, right?Smile ).

RavenDB 4.0 Unsung HeroesThe indexing threads

time to read 3 min | 473 words

wire-33134_640A major goal in RavenDB 4.0 is to eliminate as much as possible complexity from the codebase. One of the ways we did that is to simplify thread management. In RavenDB 3.0 we used the .NET thread pool and in RavenDB 3.5 we implemented our own thread pool to optimize indexing based on our understanding of how indexing are used. This works, is quite fast and handles things nicely as long as everything works. When things stop working, we get into a whole different story.

A slow index can impact the entire system, for example, so we had to write code to handle that, and noisy indexing neighbors can impact overall indexing performance  and tracking costs when the indexing work is interleaved is anything but trivial. And all the indexing code must be thread safe, of course.

Because of that, we decided we are going to dramatically simplify our lives. An index is going to use a single dedicated thread, always. That means that each index gets their own thread and are only able to interfere with their own work. It also means that we can have much better tracking of what is going on in the system. Here are some stats from the live system.


And here is another:


What this means is that we have fantastically detailed view of what each index is doing, in terms of CPU, memory and even I/O utilization is needed. We can also now define fine grained priorities for each index:


The indexing code itself can now assume that it single threaded, which free a lot of complications and in general make things easier to follow.

There is the worry that a user might want to run 100 indexes per database and 100 databases on the same server, resulting in a thousand of indexing threads. But given that this is not a recommended configuration and given that we tested it and it works (not ideal and not fun, but works), I’m fine with this, especially given the other alternative that we have today, that all these indexes will fight over the same limited number of threads and stall indexing globally.

The end result is that thread per index allow us to have fine grained control over the indexing priorities, account for memory and CPU costs as well simplify the code and improve the overall performance significantly. A win all around, in my book.


  1. Complex Linq queries in RavenDB 4.0 - 9 hours from now
  2. Cost centers, revenue centers and politics in a cross organization world - about one day from now
  3. Giving Demeter PTSD - 2 days from now
  4. PR Review: Code has cost, justify it - 3 days from now
  5. PR Review: Beware the things you can’t see - 6 days from now

And 2 more posts are pending...

There are posts all the way to Oct 25, 2017


  1. PR Review (7):
    10 Aug 2017 - Errors, errors and more errors
  2. RavenDB 4.0 (15):
    13 Oct 2017 - Interlocked distributed operations
  3. re (21):
    10 Oct 2017 - Entity Framework Core performance tuning–Part III
  4. RavenDB 4.0 Unsung Heroes (5):
    05 Oct 2017 - The design of the security error flow
  5. Writing SSL Proxy (2):
    27 Sep 2017 - Part II, delegating authentication
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats