Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q j

Posts: 6,668 | Comments: 48,517

filter by tags archive

Connection handling and authentication in RavenDB 4.0

time to read 3 min | 535 words

imageAn interesting question has popped up in the mailing list about the behavior of RavenDB. When will RavenDB client send the certificate to the server for authentication? SSL handshake typically takes multiple round trips to negotiate an SSL connection, and that a certificate can be a fairly large object. It makes sense that understanding this aspect of RavenDB behavior is going to be important for users.

In the mailing list, I gave the following answer:

RavenDB doesn’t send the certificate on a per request basis, instead, it send the certificate at the start of each connection.

I was asked for a follow up, because I wasn’t clear to the user. This is a problem, I was answering from my perspective, which is quite different from the way that a RavenDB user from the outside will look at things. Therefor, this post, and hopefully a more complete way of explaining how it all works.

RavenDB uses X509 Client Certificates for authentication, using SSL to both authenticate the remote client to the server (and the server to the client, using PKI) and to ensure that the communication between client and server are private. RavenDB utilizes TLS 1.2 for the actual low level wire transfer protocol. Given that .NET Core doesn’t yet implement TLS 1.3 or FastOpen, that means that we need to do the full negotiation on each connection.

Now, what exactly is a connection in this regard? It this going to be every call to OpenSession? The answer is empathically not. RavenDB is managing a connection pool internally (actually, we are relying on the HttpClient’s pool to do that). This means that we are only ever going to have as many TCP connections to the server as you had concurrent requests. A session will effectively borrow a connection from the pool whenever it needs to talk to the server.

The connections in the pool are going to be re-used, potentially for a long time. This allow us to alleviate the cost of actually doing the TCP & SSL handshake and amortize it over many requests. This also means that the entire cost of authentication isn’t paid on a per request basis, but per connection. What actually happens is that on the beginning of the connection, the RavenDB server will validate the client certificate and remember what permissions are granted to it. Any and all requests on this connection can then just used the cached permissions for the lifetime of the connection. This stateful approach reduce the overall cost of authentication because we don’t need to run full validation on every request.

This also means that OpenSession, for example, is basically free. All it does is allocate a bunch of dictionaries and some other data structures for the session. There is no wire traffic because the session is created, only when you actually make a request to the server (Load, Query, SaveChanges, etc). Most of the time, we don’t need to create a new connection for that, but can use a pre-existing one from the pool. The entire system was explicitly designed to take advantage of best practices to optimize your overall performance.

RavenDB 4.1 FeaturesThis document is included in your subscription

time to read 2 min | 374 words

Subscriptions in RavenDB allow you to build persistent queries, batch operations and respond immediately to changes in your data. You can read more about them in this post, and I have dedicated a full chapter to discussing them in the book.

In RavenDB 4.1 we improved subscription support by adding the ability to include related documents directly as part of the subscription. Consider the following subscription:

image

The output of this subscription is going to be orders where the geographical coordinates of the subscriptions are not known. We use that to enrich the data by adding the missing location data from the shipping address. This is important for features such as spatial searches, planning deliveries, etc.  For the purpose of this post, we’ll assume that we accept user’s addresses, which do not have spatial information on them and we use a background process to fill them in.

On the one hand, we do want to add the location data as soon as possible but on the other hand, we want to avoid making too many address resolution requests, to avoid having to pay for them. Here is what we come up with to resolve this.

You can see that there are a bunch of interesting things in this code:

  • We can access the related company on the order using Load, and it will not trigger any network call.
  • If the company already has this address, we can use the known location, without having to make an expensive geo-location call.
  • If the company doesn’t have an address, we’ll fill this in for the next run.
  • If the company’s address doesn’t have location, only then we’ll make a remote call to get the actual location data.
  • We don’t call Store() anywhere, because we are using the session instance from the batch, when we call SaveChanges(), all the changes to the loaded documents (either orders or companies) will be saved as a single transaction.
  • Because we update the Location field on the order, we won’t be called again with the updated order, since the subscription filters this.

All in all, this is a pretty neat way to handle the scenario in a very efficient manner.

RavenDB 4.1 FeaturesDetailed query timing details

time to read 1 min | 190 words

Queries are one of the most important things you do in RavenDB, and analyzing queries to understand their costs is an important step in many optimization tasks.

I’m proud to say that for the most part, you don’t need to do this very often with RavenDB. Usually the query optimizer just works and give you query speed that is fast enough that you don’t care how this is achieved. With RavenDB 4.1, we made it even easier to figure out what are the costs of the query. Here is a good example:

image

With the inclusion of “include timings()”, RavenDB will provide you with detailed stats on the costs of the relevant pieces in the query. The output in the studio looks like this:

image

You can see in a glance exactly how much time RavenDB spent on each part of your query. Armed with that information, you can set out to improve things a lot faster (pun intended).

RavenDB 4.1 FeaturesOf course I know ya, dude

time to read 2 min | 355 words

imageWe put a lot of effort into making RavenDB’s default setup a secured one. The effort wasn’t so much about securing RavenDB itself although we certainly spent a lot of time on that. Instead, a lot of the work went into making sure that the security will be usable. In other words. If you build a lock that no one can open, that isn’t a good lock, it is a horrible one. No one will use it. Indeed, the chief challenge in the design of the security mechanisms in RavenDB was making sure that they are secure and usable.

I think that we hit the right spot for the most part. You can see that it take barely any effort to setup a secured RavenDB cluster in a few minutes. That works, if you are setting up RavenDB yourself. But what happens when you need an unattended setup? What happen if you are building a Docker container? What happens if you aren’t setting up a single RavenDB instance, but three hundreds of them?

There are ways to handle this, by making sure that you are preparing the certificates and configuration ahead of time. But that is complex, and you still end up with the chicken and egg problem. How do you create a new node securely in such a way that it will know that you can trust it?

The feature itself is pretty small:

--Security.WellKnownCertificates.Admin=a909502dd82ae41433e6f83886b00d4277a32a7b

You can pass this argument as part of the RavenDB command line, in the settings.json file, in a Docker environment variable, etc.

The idea is simple. This tell RavenDB that the certificate matching this well known thumbprint is an administrator. As such, it will be trusted to perform all operations. A simple use case may be spawning three containers with this setting and using the well known certificate to connect them together and create a full cluster out of them.

For automatic deployment, this issue keeps popping up, because setting up properly is hard. We hope that this feature will make things easier.

RavenDB 4.1 FeaturesRunning RavenDB embedded

time to read 2 min | 286 words

imageA fun feature we have back in RavenDB is the ability to run RavenDB as part of your own application. Zero deployment, setup or hassle.

The following is the list of steps you will need:

  • Create a new project (.NET Core, .Net Framework, whatever).
  • Grab the pre-release bits from MyGet.
    Install-Package RavenDB.Embedded -Version 4.1.0 -Source https://www.myget.org/F/ravendb/api/v3/index.json
  • Ask the embedded client for a document store, and start working with documents.

And here is the code:

This is it. Your process has a dedicated RavenDB server running which will run alongside your application. When your process is done, RavenDB will shutdown and nothing is running. You don’t need to setup anything, install services or configure permissions.

For extra fun (and reliability), RavenDB is actually running as a separate process. This means that if you are debugging an application that is using an embedded RavenDB, you can stop in the debugger and open the studio, inspecting the current state of the database, running queries, etc.

You have the full capabilities of RavenDB at your disposal. This means being able to use the studio, replicate to other nodes, backup & restore, database encryption, the works. The client API used is the usual one, which means that if you need to switch from running in embedded to server mode, you only need to change your document store initialization and you are done.

We are also working now on bringing this feature to additional platforms. You’ll soon be able to write the following:

This is a Python application that is using RavenDB in embedded mode, and you’ll be able to do so across the board (Java, Node.js, Go, Ruby, etc).

RavenDB 4.1 FeaturesCan you explain that choice?

time to read 2 min | 204 words

One of the features that RavenDB exposes is the ability to get results in match order. In other words, you can write a query like this:

from Employees 
where 
    boost(Address.City = 'London', 10) 
or  boost(Address.City = 'Seattle', 5)
order by score()
include explanations()

In other words, find me all the employees in London or Seattle, but I want to get the London employees first. This is a pretty simple example. But there are cases where you may involve multiple clauses, full text search, matches on arrays, etc. Figuring out why the documents came back in the order that they did can be complex.

Luckily, RavenDB 4.1 give you a new feature just for that. Look at the last line in the query: “include explanations()”. This will tell RavenDB that you want to dig into the actual details of the query, and in the user interface you’ll get:

image

And the full explanation for each is:

image

I want to see the QA process that catch this bug!

time to read 2 min | 344 words

When we get bug reports from the field, we routinely also do a small assessment to figure out why we missed the issue in our own internal tests and runway to production.

We just got a bug report like that. RavenDB is not usable at all on a Raspberry PI because of an error about Non ASCII usage.

This is strange. To start with we test on Raspberry Pi. To be rather more exact, we test on the same hardware and software combination that the user was running on.  And what is this Non ASCII stuff? We don’t have any such thing in our code.

As we investigated, we figured out that the root cause was that we were trying to pass a Non ASCII value to the headers of the request. That didn’t make sense, the only things we write to the request in this case is well defined values, such as numbers and constant strings. All of which should be in ASCII. What was going on?

After a while, the mystery cleared. In order to reproduced this bug, you needed to have the following preconditions:

  • A file hashed to a negative Int64 value.
  • A system whose culture settings was set to sv-SE (Swedish).
  • Run on Linux.

This is detailed in this issue. On Linux (and not on Windows), when using Swedish culture, negative numbers are using: ”−1” and not “-1”.

For those of you with sharp eyes, you noticed that this is U+2212, (minus sign), and not U+002D (hyphen minus). On Linux, for Unicode knows what, this is used as the negative mark. I would complain, but my native language has „.

Anyway, the fix was to force the usage of invariant when converting the Int64 to a string for the header, which is pretty obvious. We are also exploring how to fix this in a more global manner.

But I keep coming back to the set of preconditions that is required. Sometimes I wonder why we miss a bug, in this case, I can only say that I would have been surprised if we would have found it.

RavenDB 4.1 FeaturesCluster wide ACID transactions

time to read 5 min | 903 words

imageOne of the major features coming up in RavenDB 4.1 is the ability to do a cluster wide transaction. Up until this point, RavenDB’s transactions were applied at each node individually, and then sent over to the rest of the cluster. This follows the distributed model outlined in the Dynamo paper. In other words, writes are important, always  accept them. This works great for most scenarios, but there are a few cases were the user might wish to explicitly choose consistency over availability. RavenDB 4.1 brings this to the table in what I consider to be a very natural manner.

This feature builds on the already existing compare exchange feature in RavenDB 4.0. The idea is simple. You can package a set of changes to documents and send them to the cluster. This set of changes will be applied to all the cluster nodes (in an atomic fashion) if they have been accepted by a majority of the nodes in the cluster. Otherwise, you’ll get an error and the changes will never be applied.

Here is the command that is sent to the server.

image

RavenDB ensures that this transaction will only be applied after a majority confirmation. So far, that is nice, but you could do pretty much the same thing with write assurance, a feature RavenDB has for over five years. Where it gets interesting is the fact that you can make the operation in the transaction conditional. They will not be executed unless a certain (cluster wide) state has an expected value.

Remember that I said that cluster wide transactions build upon the compare exchange feature? Let’s see what we can do here. What happens if we wanted to state that a user’s name must be unique, cluster wide. Previously, we had the unique constraints bundle, but that didn’t work so well in a cluster and was removed in 4.0. Compare exchange was meant to replace it, but it was hard to use it with document modifications, because you didn’t have a single transaction boundary. Well, now you do.

Let’s see what I mean by this:

As you can see, we have a new command there: “ClusterTransaction.CreateCompareExchangeValue”. This is adding another command to the transaction. A compare exchange command. In this case, we are saying that we want to create a new value named “usernames/Arava” and set its value to the document id.

Here it the command that is sent to the server:

image

At this point, the server will accept this transaction and run it through the cluster. If a majority of the nodes are available, it will be accepted. This is just like before. The key here is that we are going to run all the compare exchange commands first. Here is the end result of this code:

image

We add both the compare exchange and the document (and the project document not shown) here as a single operation.

Here is the kicker. What happen if we’ll run this code again?

You’ll get the following error:

Raven.Client.Exceptions.ConcurrencyException: Failed to execute cluster transaction due to the following issues: Concurrency check failed for putting the key 'usernames/Arava'. Requested index: 0, actual index: 1243

Nothing is applied and the transaction is rolled back.

In other words, you now have a way to provide consistent concurrency check cluster wide, even in a distributed system. We made sure that a common scenario like uniqueness checks would be trivial to implement. The feature allows you to do in-transaction manipulation of the compare exchange values and ensure that document changes will only be applied if all the compare exchange operations (and you have more than one) have passed.

We envision this being used for uniqueness, of course, but also for high value operations where consistency is more important than availability. A good example would be creating an order for a seat in a play. Multiple customers might try to purchase the same seat at the same time, and you can use this feature to ensure that you don’t double book it*. If you manage to successfully claim the seat, your order document is updated and you can proceed. Otherwise, the whole thing rolls back.

This can significantly simplify workflow where you might have failure mid operation, by giving you transactional guarantee around the whole cluster.

A cluster transaction can only delete or put documents, you cannot use a patch. This is because the result of the cluster transaction must be self contained and repeatable. A document modified by a cluster transaction may also take part in replication (including external replication). In fact, documents modified by cluster transactions behave just like normal documents. However, conflicts between documents modified by cluster transactions and modifications that weren’t made by cluster transaction are always resolved in favor of the cluster transactions modifications. Note that there can never be a conflict between modifications on cluster transactions. They are guaranteed proper sequence and ordering by the nature of running them through the consensus protocol.

* Yes, I know that this isn’t how it actually work, but it is a nice example.

RavenDB 4.1 FeaturesExplain that choice

time to read 2 min | 277 words

One of the things that we do in RavenDB is try to expose as much as possible the internal workings and logic inside RavenDB. In this case, the relevant feature we are trying to expose is the inner working of the query optimizer.

Consider the following query, running on a busy system.

image

This will go to query optimizer, that needs to select the appropriate index to run this query on. However, this process is somewhat of a black box from the outside. Let me show you how RavenDB externalize that decision.

image

You can see that there were initially three index candidates for this. The first one doesn’t index FirstName, so it was ruled out immediately. That gave us a choice of two suitable indexes.

The query optimizer selected the index that has the higher number of fields. This is done to route queries from narrower indexes so they will be retired sooner.

This is a simple case, there are many other factors that may play into the query optimizer decision, such as when an index is stale because it was just created. The query optimizer will then choose another index until the stale index catch up with all its work.

To be honest, I mostly expect this to be of use when we explain how the query optimizer work. Of course, if you are investigating “why did you use this index and not that one” in production, this feature is going to be invaluable.

RavenDB Security Vulnerability Advisory

time to read 3 min | 533 words

You can read the full details here. The short of it is that we discovered a security vulnerability in RavenDB. This post tells a story. For actionable operations, see the previous link and upgrade your RavenDB instance to a build that includes the fix.

Timeline:

  • June 6 – A routine code review inside RavenDB expose a potential flaw in sanitizing external input. It is escalated and confirmed be a security bug. Further investigation classify it as CRTICIAL issue. A lot of sad faces on our slack channels show up. The issue has the trifecta of security problems:
    • It is remotely exploitable.
    • It is on in the default configuration.
    • It provide privilege escalation (and hence, remote code execution).
  • June 6 – A fix is implemented. This is somewhat complicated by the fact that we don’t want it to look like a security fix to avoid this issue.
  • June 7 – The fix goes through triple code review by independent teams.
  • June 7 – An ad hoc team goes through all related functionality to see if similar issues are still present.
  • June 8 – Fixed version is deployed to our production environment.

We had to make a choice here, whatever to alert all users immediately, or first provide the fix and urge them to upgrade (while opening them up to attacks in the meanwhile). We also want to avoid the fix, re-fix, for-real-this-time cycle from rushing too often.

As this was discovered internally and there are no indications that this is known and/or exploited in the wild, we chose the more conservative approach and run our full “pre release” cycle, including full 72-96 hours in a production environment serving live traffic.

  • June 12 – The fix is now available in a publicly released version (4.0.5).
  • June 13 – Begin notification of customers. This was done by:
    • Emailing all RavenDB 4.0 users. One of the reasons that we ask for registration even for the free community edition is exactly this. We want to be able to notify users when such an event occur.
    • Publishing security notice on our website.
    • Pushing a notification to all vulnerable RavenDB nodes warning about this issue. Here is what this looks like:
      image
  • Since June 13 – Monitoring of deployed versions and checking for vulnerable builds still in use.
  • June 18 – This blog post and public notice in the mailing list to get more awareness of this issue. The website will also contain the following notice for the next couple weeks to make sure that everyone know that they should upgrade:
    image

We are also going to implement a better method to push urgent notices like that in the future, to make sure that we can better alert users. We have also inspected the same areas of the code in earlier versions and verified that this is a new issue and not something that impacts older versions.

I would be happy to hear what more we can do to improve both our security and our security practices.

And yes, I’ll discuss the actual vulnerability in detail in a month or so.

FUTURE POSTS

  1. Deployment Postmortem: RavenDB inside Kubernetes - 13 days from now

There are posts all the way to Aug 03, 2018

RECENT SERIES

  1. RavenDB 4.1 features (11):
    04 Jul 2018 - This document is included in your subscription
  2. Codex KV (2):
    06 Jun 2018 - Properly generating the file
  3. I WILL have order (3):
    30 May 2018 - How Bleve sorts query results
  4. Inside RavenDB 4.0 (10):
    22 May 2018 - Book update
  5. RavenDB Security Report (5):
    06 Apr 2018 - Collision in Certificate Serial Numbers
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats