Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,522 | Comments: 47,973

filter by tags archive

Reducing the cost of support with anticipatory errors

time to read 2 min | 262 words

I speak a lot about good error handlings and our perspective on support. We consider support to be a cost center (as in, we don’t want or expect to make money from support), I spoke at this at more length here. Today I run into what is probably the best example for exactly what this means in a long while.

A user got an error:

image

The error is confusing, because they are able to access this machine and URL. The actual issue, if you open the show details is here:

Connection test failed: An exception was thrown while trying to connect to http://zzzzzzzz.com:8081 : System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (0x80004005): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 143.29.63.128:53329

And that is good enough to explain to me, what is going on. RavenDB is usually using HTTP for most things, but it is using TCP connections to handle performance critical stuff, such as clustering, replication, etc. In this case, the server is listening on port 53329 for TCP connections, but since this is a public facing instance, the port is not accessible from the outside world.

This issue has generated a support call, but a better message, explaining that we could hit the HTTP endpoint and not the TCP endpoint would have led the user knowing exactly what the issue is and solving the problem on their own, much faster.

Queries++ in RavenDBFacets of information

time to read 3 min | 536 words

image

RavenDB has a lot of functionality that is available just underneath the surface. In addition to just finding documents, you can use RavenDB to find a lot more about what is going on in your database. This series of posts is aimed at exposing some of the more fun things that you can do with RavenDB that you are probably not aware of.

One of the those things is the idea of not just querying for information, but also querying for the facets of the results. This can be very useful if you are likely to search for something that would return a lot of results and you want to quickly filter these out without having the user do a lot of trial an error. This is one of those cases where it is much easier to explain what is going on with a picture.

Imagine that you are searching for a phone. You might have a good idea what you are looking for a phone on eBay. I just did that and it gave me over 300 thousands results. The problem is that if I actually want to buy one of them, I’m not going to scroll through however many pages of product listings. I need a way to quickly narrow down the selection, and facets allow me to do that, as you can see in the image. Each of these is a facet and I can filter out things so only the stuff that I’m interested in will be shown, allowing me to quickly make a decision (and purchase).

Using the sample dataset in RavenDB, we’ll explore how we can run faceted searches in RavenDB. First, we’ll define the “Products/Search” index:

Using this index, we can now ask RavenDB to give us the facets from this dataset, like so:

image

This will give us the following results:

image

And we can inspect each of them in turn:

image     image

These are easy, because they give us the count of matching products for each category and supplier. Of more interest to us is the Prices facet.

image

And here we can see how we sliced and diced the results. We can narrow things further with the user’s choices, of course, let’s check out this query:

image

Which gives us the following Prices facet:

image

This means that you can, in a very short order, produce really cool search behavior for your users.

Time handling and user experience, Backup Scheduling in RavenDB as a case study

time to read 3 min | 597 words

Time handling in software sucks. In one of the RavenDB conferences a few years ago we had a fantastic talk for over an hour that talked about just that.  It sucks because what a computer think about as time and what we think about as time are two very different things. This usually applies to applications, since that is where you are typically working with dates & times in front of the users, but we had an interesting case with backups scheduling inside RavenDB.

RavenDB allow you to schedule full and incremental backups and it used the CRON format to set things up. This make things very easy to setup and is highly configurable.

It is also very confusing. Consider the following initial user interface:

image

It’s functional, does exactly what it needs to do and allow the administrator complete freedom. It is also pretty much opaque, requiring the admin to either know the CRON format by heart (possible, but not something that we want to rely on) or find a webpage that would translate that.

The next thing we did was to avoid the extra step and let you know explicitly what this definition meant.

image

This is much better, but we can still do better. Instead of just an abstract description, let us let the user know when the next backup is going to happen. If you run backups each Friday, you probably want to change that to the before or after Black Friday, for example. So date & time matter.

image

This lead us to the next issue, what time? In particular, backups are done on the server’s local time, on the assumption that most of the time this is what the administrator will expects. This make it easier to do things like schedule backup to happen in the off hours. We thought about doing that always in UTC, but this would require you to always do date math in your head.

That does lead to the issue of what to do when the admin’s clock and the server clock are out of sync? This is how this will look like in that case.


image

We let the user know that the backup will run in the local server time and when that will happen in the user’s time.

We also provide on the fly translation from CRON format to a human readable form.

image

And finally, to make sure that we cover all the basis, in addition to giving you the time specification, the server time and local time, we also give you the time duration to the next backup.

image

I think that this covers up pretty much every scenario that I can think of.

Except getting the administrator to do practice restores to ensure that they are familiar with how to do this. Smile

Update: Here is how the field looks like when empty:

image

API DesignThe lack of a method was intentional forethought

time to read 4 min | 639 words

imageOne of the benefits of having a product in the market for a decade is that you gain some experience in how people are using it. This lead to interesting design decisions over time. Some of them are obvious. Such as the setup process for RavenDB. Some aren’t, such as the surface of the session. It is kept small and focused on CRUD operations to make it easy to understand and use in the common cases.

And sometimes, the design is in the fact that the code isn’t there at all. Case in point, the notion of connection strings in RavenDB 4.0. This feature was removed in its entirety in this release and users are expected to provide the connection parameters to the document store on their own. How they do that is not something that we concern ourselves with. A large part of the reasoning behind this decision was around our use of X509 certificates for authentication. In many environments there are strict rules about the usage and deployment of certificates and having a connection string facility would force us to always chase the latest ones. For that matter, where you store the connection string is also a problem. We have seen configuration stored in app.config, environment variables, json configuration, DI configuration and more. And each time we were expected to support this new method of getting the connection string.  By not having any such mechanism, we are able to circumvent the problem entirely.

This sounds like a copout, but it isn’t. Consider this thread in the RavenDB mailing list. It talks about how to setup RavenDB 4.0 in Azure in a secure manner. Just reading the title of the thread made me cringe, thinking that this is going to be a question that would take a long time to answer (setup time, mostly). But that isn’t it at all. instead, this is a walk through showing you how you can setup things properly in an environment where you cannot load a certificate from a file and need to do that directly from the Azure certificate store.

This is quite important, since this is one of the things that I keep having to explain to team members. We want to be a very clear demarcation about the kind of things that we support and the kinds we don’t. Mostly because I’m not willing to do half ass job in supporting things. So saying something like: Oh, we’ll just support a file path and we’ll let the user do the rest for more complex stuff is not going to fly with this design philosophy.

If we do something, a user reasonably expects us to do a complete job in doing that and puts the entire onus of responsibility on us. On the other hand, if you don’t do something, there is usually no expectation that you’ll handle that. There is also the issue that is many cases, solving the general problem is nearly impossible while solving a particular user scenario is trivial. So letting them have full responsibility works much better. At a minimum, they don’t need to circumvent the things we do for the stuff that we do support, but can start from a clear ground.

Coming back to the certificate example, if we would have a Certificate property and a CertificatePath property, allowing for each setup for a common scenario, then it is easy down the line to just assume that the CertificatePath is set if we have a certificate, and suddenly a user that doesn’t use a certificate from a file is going to need to be aware of this and handle the issue. If there is no such property, the behavior is always going to be correct.

Carefully performing invalid operations to get the wrong error and the right result

time to read 2 min | 285 words

The RavenDB setup process happens in the browser, and the last part involves restarting the server and then redirecting you to the new server. Along the way you have also specified a certificate to use and the other configurations.

We got a bug report about that when the admin configured us with a self signed certificate. During the time the server restarts, the browser will ping the server periodically, waiting for it to come back up with the new configuration. That can be a problem when using self signed certificates, because the browser will reject them as untrusted.

From the point of view of the client side running in the browser, there is no way to tell the difference from a server that is down and a server that is using a self signed certificate. But we wanted to get the nice feature of showing you when you can move to the new URL. So how can you do that?

Remember that RavenDB have auto detection for invalid HTTP access when using HTTPS? And that this error is raised at the HTTP level?

That means that we can carefully construct an HTTP request such as “http://my.ravendb.sever:443” and check what the result is. If the server is up, the request will fail with a bad request error, and that means that we can distinguish between the server being down and the server being up (but maybe with bad cert).

In fact, once we know the server is up, we can check if the certificate is valid, and show something about it.

This is convoluted, requires us to do things at several places at once at very different levels of the stack. But it is quite amazing to see, it just works!

The best features are the ones you never knew were thereYou can’t do everything

time to read 2 min | 241 words

imageIn the previous posts in this series, I talked about the kind of features that we build into RavenDB. Things that you never even notice making your life easier.

One feature we don’t have is doing HTTPS to HTTP downgrade. What do I mean by that? Assume that you have a RavenDB instance that is running using HTTP, and a client attempts to connect to it using HTTPS. Remember that we are assuming that the access it made on the same port. So the client wrote "https://my.raven.database:8080” instead of “http://my.raven.database:8080”.

If the other thing would happen, we would detect that and give a clear error to the user. But the other way around? We don’t do that, but why?

Well, the reasoning is very simple. If you connect to an HTTP endpoint using HTTPS, the first packet on the wire wants to do SSL negotiation. However, we don’t have a certificate that we can use here, so we can’t even start the negotiation process.

We could try generating a self signed certificate on the fly and answer the request with an error. But at this point, the client will likely already error at a low level because of the self signed certificate not being trusted.

Another point against implementing this feature is that HTTP endpoints typically become HTTPS, but rarely the other way around.

RavenDB SetupHow the automatic setup works

time to read 8 min | 1456 words

imageOne of the coolest features in the RC2 release for RavenDB is the automatic setup, in particular, how we managed to get a completely automated secured setup with minimal amount of fuss on the user’s end.

You can watch the whole thing from start to finish, it takes about 3 minutes to go through the process (if you aren’t also explaining what you are doing) and you have a fully secured cluster talking to each other over secured TLS 1.2 channels.  This was made harder because we are actually running with trusted certificates. This was a hard requirement, because we use the RavenDB Studio to manage the server, and that is a web application hosted on RavenDB itself. As such, it is subject to all the usual rules of browser based applications, including scary warnings and inability to act if the certificate isn’t valid and trusted.

In many cases, this lead people to chose to use HTTP. Because at least with that model, you don’t have to deal with all the hassle. Consider the problem. Unlike a website, that has (at least conceptually) a single deployment, RavenDB is actually deployed on customer sites and is running on anything from local developer machines to cloud servers. In many cases, it is hidden behind multiple layers of firewalls, routers and internal networks. Users may chose to run it in any number of strange and wonderful configurations, and it is our job to support all of them.

In such a situation, defaulting to HTTP only make things easy. Mostly because things work. Using HTTPS require that we’ll use a certificate. We can obviously use a self signed certificate, and have the following shown to the user on the first access to the website:

image

As you can imagine, this is not going to inspire confidence with users. In fact, I can think of few other ways to ensure the shortest “download to recycle bin” path. Now, we could ask the administrator to generate a certificate an ensure that this certificate is trusted. And that would work, if we could assume that there is an administrator. I think that asking a developer that isn’t well versed in security practices to do that is likely to result in an even shorter “this is waste of my time” reaction than the unsecured warning option.

We considered the option of installing a (locally generated) root certificate and generating a certificate from that. This would work, but only on the local machine, and RavenDB is, by nature, a distributed database. So that would make for a great demo, but it would cause a great deal of hardships down the line. Exactly the kind of feature and behavior that we don’t want. And even if we generate the root certificate locally and throw it away immediately afterward, the idea still bothered me greatly, so that was something that we considered only in times of great depression.

So, to sum it all up, we need a way to generate a valid certificate for a random server, likely running in a protected network, inaccessible from the outside (as  in, pretty much all corporate / home networks these days). We need to do without requiring the user to do things like setup dynamic DNS, port forwarding in router or generating their own certificates. We also need to to be fast enough that we can do that as part of the setup process. Anything that would require a few hours / days is out of the question.

We looked into what it would take to generate our own trusted SSL certificates. This is actually easily possible, but the cost is prohibitive, given that we wanted to allow this for free users as well, and all the options we got always had a per generated certificate cost associated with it.

Let’s Encrypt is the answer for HTTPS certificate generation on the public web, but the vast majority all of our deployments are likely to be inside the firewall, so we can’t verify a certificate using Let’s Encrypt. Furthermore, doing so will require users to define and manage DNS settings as part of the deployment of RavenDB. That is something that we wanted to avoid.

This might require some explanation. The setup process that I’m talking about is not just to setup a production instance. We consider any installation of RavenDB to be worth a production grade setup. This is a lesson from the database ransomware tales. I see no reason why we should learn this lesson again on the backs of our users, so a high priority was given to making sure that the default install mode is also the secure and proper one.

All the options that are ruled out in this post (provide your own certificate, setup DNS, etc) are entirely possible (and quite easily) with RavenDB, if an admin so chose, and we expect that many will want to setup RavenDB in a manner that fits their organization policies. But here we are talkingh about the base line (yes, dear) install and we want to make it as simple and straightforward as we possibly can.


There is another problem with Let’s Encrypt for our situation, we need to generate a lot of certificates, significantly more than the default rate limit that Let’s Encrypt provides. Luckily, they provide a way to request an extension to this rate limit, which is exactly what we did. Once this was granted, we were almost there.

imageThe way RavenDB generates certificates as part of the setup process is a bit involved. We can’t just generate any old hostname, we need to provide proof to Let’s Encrypt that we own the hostname in question. For that matter, who is the we in question? I don’t want to be exposed to all the certificates that are generated for the RavenDB instances out there. That is not a good way to handle security.

The key for the whole operation is the following domain name: dbs.local.ravendb.net

During setup, the user will register a subdomain under that, such as arava.dbs.local.ravendb.net. We ensure that only a single user can claim each domain. Once they have done that, they let RavenDB what IP address they want to run on. This can be a public IP, exposed on the internet, a private one (such as 192.168.0.28) or even a loopback device (127.0.0.1).

The local server, running on the user’s machine then initiates a challenge to Let’s Encrypt for the hostname in question. With the answer to the challenge, the local server then call to api.ravendb.net. This is our own service, running on the cloud. The purpose of this service is to validate that the user “owns” the domain in question and to update the DNS records to match the Let’s Encrypt challenge.

The local server can then go to Let’s Encrypt and ask them to complete the process and generate the certificate for the server. At no point do we need to have the certificate go through our own servers, it is all handled on the client machine. There is another thing that is happening here. Alongside the DNS challenge, we also update the domain the user chose to point to the IP they are going to be hosted at. This means that the global DNS network will point to your database. This is important, because we need the hostname that you’ll use to talk to RavenDB to match the hostname on the certificate.

Obviously, RavenDB will also make sure to refresh the Let’s Encrypt certificate on a timely basis.

The entire process is seamless and quite amazing when you see it. Especially because even developers might not realize just how much goes on under the cover and how much pain was taken away from them.

We run into a few issues along the way and Let’s Encrypt support has been quite wonderful in this regard, including deploying a code fix that allowed us to make the time for RC2 with the full feature in place.

There are still issues if you are running on a completely isolated network, and some DNS configurations can cause issues, but we typically detect and give a good warning about that (allowing you to switch to 8.8.8.8 as a good workaround for most such issues). The important thing is that we achieve the main goal, seamless and easy setup with the highest level of security.

RavenDB Setupa secured cluster in 10 minutes or less

time to read 1 min | 86 words

One of the major features of the RC2 release for RavenDB has been the setup process. In particular, we worked on making sure that the default and easiest manner to install RavenDB will be the one with the highest level of security.

I’m excited enough by this feature that I recorded myself setting up a full blown cluster, including everything you need for production deployment in under 10 minutes, with a lot of my explanations in the middle. Take a look.

The best features are the ones you never knew were thereUnsecured SSL/TLS

time to read 3 min | 598 words

imageI wish I would have been sufficient to use HTTPS for security. With RavenDB 4.0’s move toward TLS as the security mechanism for encryption of data over the wire and authentication using x509 we had to learn way too much about how Transport Layer Security works.

In particular, it can be quite annoying when you realize that just because you use SSL (or more accurately, TLS) that isn’t sufficient. You need to use the proper version, and there are interoperability issues. Many of RavenDB’s users run it in an environments that are subject to strict scrutiny and high level of regulation and oversight. That means that we need to make sure that we are able to operate in such environment. One option would be to use something like a FIPS configuration. We have a “normal”configuration and one that is aimed at people that need stricter standards. For many reasons, this is a really bad idea. Not least of all is the problem that even if you don’t have to meet FIMS mandate, you still want to be secured. Amusingly enough, many FIPS certified stacks are actually less secured (because they can’t get patches to the certified binaries).

So the two options mode was rejected. That meant that we should run in a mode that is can be match the requirements of the most common deployment regulations. In particular interest to us is PCI compliance, since we are often deployed in situations that involve money and payment processing.

That can be a bit of a problem. PCI requires that your communication will use TLS, obviously. But it also requires it to use TLS 1.2. That is great and with .NET it is easily supported. However, not all the tools are aware of this. This put us back in the same state as with HTTP vs. HTTPS. If your client does not support TLS 1.2 and your server require TLS 1.2, you end up in with a with a connection error.

image

Such a thing can be maddening for the user.

Therefor, RavenDB will actually allow Tls and Tls11 connections, but instead of processing the request, it will give you an error that give you something to work with.

image

Updated: I forgot to actually read the message. The reason you are getting the error about no certificate is because there isn’t a certificate here. In order for this to work, we need to actually pass the certificate, in which case we’ll get the appropriate error. I apologize for the error handling, but PowerShell:
image

Armed with this information, you can now do a simple web search and realize that you actually need to do this:

image

And that saves us a lot of TCP level debugging. It took a bit of time to set this (and the other) errors properly, and they are exactly the kind of things that will save you hours or days of frustration, but you’ll never realize that they were there even if you run into them unless you know the amount of effort that went into setting this up.

The bare minimum a distributed system developer should know aboutBinding to IP addresses

time to read 3 min | 543 words

It is easy to think about a service that listen to the network as just that, it listens to the network. In practice, this is often quite a bit more complex than that.

For example, what happens when I’m doing something like this?

image

In this case, we are setting up a web server with binding to the local machine name. But that isn’t actually how it works.

At the TCP level, there is no such thing as machine name. So how can this even work?

Here is what is going on. When we specify a server URL in this manner, we are actually doing something like this:

image

And then the server is going to bind to each and every one of them. Here is an interesting tidbit:

image

What this means is that this service doesn’t have a single entry point, you can reach it through multiple distinct IP addresses.

But why would my machine have so may IP addresses? Well, let us take a look. It looks like this machine has quite a few network adapters:

image

I got a bunch of virtual ones for Docker and VMs, and then the Wi-Fi (writing on my laptop) and wired network.

Each one of these represent a way to bind to the network. In fact, there are also over 16 million additional IP addresses that I’m not showing, the entire 127.x.x.x range. (You probably know that 127.0.0.1 is loopback, right? But so it 127.127.127.127, etc.).

All of this is not really that interesting, until you realize that this has real world implications for you. Consider a server that has multiple network cards, such as this one:

image

What we have here is a server that has two totally separate network cards. One to talk to the outside world and one to talk to the internal network.

When is this useful? In pretty much every single cloud provider you’ll have very different networks. On Amazon, the internal network gives you effectively free bandwidth, while you pay for the external one. And that is leaving aside the security implications

It is also common to have different things bound to different interfaces. Your admin API endpoint isn’t even listening to the public internet, for example, it will only process packets coming from the internal network. That adds a bit more security and isolation (you still need encryption, authentication, etc of course).

Another deployment mode (which has gone out of fashion) was to hook both network cards to the public internet, using different routes. This way, if one went down, you could still respond to requests, and usually you could also handle more traffic. This was in the days where the network was often the bottleneck, but nowadays I think we have enough network bandwidth that program efficiency is of more importance and this practice somewhat fell out of favor.

FUTURE POSTS

  1. Queries++ in RavenDB: Gimme more like this - 3 hours from now
  2. Setting unrealistic goals, then exceeding them - about one day from now
  3. Queries++ in RavenDB: I suggest you can do better - 2 days from now
  4. The married couple component design pattern - 3 days from now
  5. Queries++ in RavenDB: Spatial searches - 4 days from now

And 2 more posts are pending...

There are posts all the way to Dec 19, 2017

RECENT SERIES

  1. PR Review (9):
    08 Nov 2017 - Encapsulation stops at the assembly boundary
  2. Queries++ in RavenDB (4):
    07 Dec 2017 - Facets of information
  3. Production postmortem (21):
    06 Dec 2017 - data corruption, a view from INSIDE the sausage
  4. API Design (9):
    04 Dec 2017 - The lack of a method was intentional forethought
  5. The best features are the ones you never knew were there (5):
    27 Nov 2017 - You can’t do everything
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats