Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,517 | Comments: 47,934

filter by tags archive

RavenDB SetupHow the automatic setup works

time to read 8 min | 1456 words

imageOne of the coolest features in the RC2 release for RavenDB is the automatic setup, in particular, how we managed to get a completely automated secured setup with minimal amount of fuss on the user’s end.

You can watch the whole thing from start to finish, it takes about 3 minutes to go through the process (if you aren’t also explaining what you are doing) and you have a fully secured cluster talking to each other over secured TLS 1.2 channels.  This was made harder because we are actually running with trusted certificates. This was a hard requirement, because we use the RavenDB Studio to manage the server, and that is a web application hosted on RavenDB itself. As such, it is subject to all the usual rules of browser based applications, including scary warnings and inability to act if the certificate isn’t valid and trusted.

In many cases, this lead people to chose to use HTTP. Because at least with that model, you don’t have to deal with all the hassle. Consider the problem. Unlike a website, that has (at least conceptually) a single deployment, RavenDB is actually deployed on customer sites and is running on anything from local developer machines to cloud servers. In many cases, it is hidden behind multiple layers of firewalls, routers and internal networks. Users may chose to run it in any number of strange and wonderful configurations, and it is our job to support all of them.

In such a situation, defaulting to HTTP only make things easy. Mostly because things work. Using HTTPS require that we’ll use a certificate. We can obviously use a self signed certificate, and have the following shown to the user on the first access to the website:

image

As you can imagine, this is not going to inspire confidence with users. In fact, I can think of few other ways to ensure the shortest “download to recycle bin” path. Now, we could ask the administrator to generate a certificate an ensure that this certificate is trusted. And that would work, if we could assume that there is an administrator. I think that asking a developer that isn’t well versed in security practices to do that is likely to result in an even shorter “this is waste of my time” reaction than the unsecured warning option.

We considered the option of installing a (locally generated) root certificate and generating a certificate from that. This would work, but only on the local machine, and RavenDB is, by nature, a distributed database. So that would make for a great demo, but it would cause a great deal of hardships down the line. Exactly the kind of feature and behavior that we don’t want. And even if we generate the root certificate locally and throw it away immediately afterward, the idea still bothered me greatly, so that was something that we considered only in times of great depression.

So, to sum it all up, we need a way to generate a valid certificate for a random server, likely running in a protected network, inaccessible from the outside (as  in, pretty much all corporate / home networks these days). We need to do without requiring the user to do things like setup dynamic DNS, port forwarding in router or generating their own certificates. We also need to to be fast enough that we can do that as part of the setup process. Anything that would require a few hours / days is out of the question.

We looked into what it would take to generate our own trusted SSL certificates. This is actually easily possible, but the cost is prohibitive, given that we wanted to allow this for free users as well, and all the options we got always had a per generated certificate cost associated with it.

Let’s Encrypt is the answer for HTTPS certificate generation on the public web, but the vast majority all of our deployments are likely to be inside the firewall, so we can’t verify a certificate using Let’s Encrypt. Furthermore, doing so will require users to define and manage DNS settings as part of the deployment of RavenDB. That is something that we wanted to avoid.

This might require some explanation. The setup process that I’m talking about is not just to setup a production instance. We consider any installation of RavenDB to be worth a production grade setup. This is a lesson from the database ransomware tales. I see no reason why we should learn this lesson again on the backs of our users, so a high priority was given to making sure that the default install mode is also the secure and proper one.

All the options that are ruled out in this post (provide your own certificate, setup DNS, etc) are entirely possible (and quite easily) with RavenDB, if an admin so chose, and we expect that many will want to setup RavenDB in a manner that fits their organization policies. But here we are talkingh about the base line (yes, dear) install and we want to make it as simple and straightforward as we possibly can.


There is another problem with Let’s Encrypt for our situation, we need to generate a lot of certificates, significantly more than the default rate limit that Let’s Encrypt provides. Luckily, they provide a way to request an extension to this rate limit, which is exactly what we did. Once this was granted, we were almost there.

imageThe way RavenDB generates certificates as part of the setup process is a bit involved. We can’t just generate any old hostname, we need to provide proof to Let’s Encrypt that we own the hostname in question. For that matter, who is the we in question? I don’t want to be exposed to all the certificates that are generated for the RavenDB instances out there. That is not a good way to handle security.

The key for the whole operation is the following domain name: dbs.local.ravendb.net

During setup, the user will register a subdomain under that, such as arava.dbs.local.ravendb.net. We ensure that only a single user can claim each domain. Once they have done that, they let RavenDB what IP address they want to run on. This can be a public IP, exposed on the internet, a private one (such as 192.168.0.28) or even a loopback device (127.0.0.1).

The local server, running on the user’s machine then initiates a challenge to Let’s Encrypt for the hostname in question. With the answer to the challenge, the local server then call to api.ravendb.net. This is our own service, running on the cloud. The purpose of this service is to validate that the user “owns” the domain in question and to update the DNS records to match the Let’s Encrypt challenge.

The local server can then go to Let’s Encrypt and ask them to complete the process and generate the certificate for the server. At no point do we need to have the certificate go through our own servers, it is all handled on the client machine. There is another thing that is happening here. Alongside the DNS challenge, we also update the domain the user chose to point to the IP they are going to be hosted at. This means that the global DNS network will point to your database. This is important, because we need the hostname that you’ll use to talk to RavenDB to match the hostname on the certificate.

Obviously, RavenDB will also make sure to refresh the Let’s Encrypt certificate on a timely basis.

The entire process is seamless and quite amazing when you see it. Especially because even developers might not realize just how much goes on under the cover and how much pain was taken away from them.

We run into a few issues along the way and Let’s Encrypt support has been quite wonderful in this regard, including deploying a code fix that allowed us to make the time for RC2 with the full feature in place.

There are still issues if you are running on a completely isolated network, and some DNS configurations can cause issues, but we typically detect and give a good warning about that (allowing you to switch to 8.8.8.8 as a good workaround for most such issues). The important thing is that we achieve the main goal, seamless and easy setup with the highest level of security.

RavenDB Setupa secured cluster in 10 minutes or less

time to read 1 min | 86 words

One of the major features of the RC2 release for RavenDB has been the setup process. In particular, we worked on making sure that the default and easiest manner to install RavenDB will be the one with the highest level of security.

I’m excited enough by this feature that I recorded myself setting up a full blown cluster, including everything you need for production deployment in under 10 minutes, with a lot of my explanations in the middle. Take a look.

The best features are the ones you never knew were thereUnsecured SSL/TLS

time to read 3 min | 598 words

imageI wish I would have been sufficient to use HTTPS for security. With RavenDB 4.0’s move toward TLS as the security mechanism for encryption of data over the wire and authentication using x509 we had to learn way too much about how Transport Layer Security works.

In particular, it can be quite annoying when you realize that just because you use SSL (or more accurately, TLS) that isn’t sufficient. You need to use the proper version, and there are interoperability issues. Many of RavenDB’s users run it in an environments that are subject to strict scrutiny and high level of regulation and oversight. That means that we need to make sure that we are able to operate in such environment. One option would be to use something like a FIPS configuration. We have a “normal”configuration and one that is aimed at people that need stricter standards. For many reasons, this is a really bad idea. Not least of all is the problem that even if you don’t have to meet FIMS mandate, you still want to be secured. Amusingly enough, many FIPS certified stacks are actually less secured (because they can’t get patches to the certified binaries).

So the two options mode was rejected. That meant that we should run in a mode that is can be match the requirements of the most common deployment regulations. In particular interest to us is PCI compliance, since we are often deployed in situations that involve money and payment processing.

That can be a bit of a problem. PCI requires that your communication will use TLS, obviously. But it also requires it to use TLS 1.2. That is great and with .NET it is easily supported. However, not all the tools are aware of this. This put us back in the same state as with HTTP vs. HTTPS. If your client does not support TLS 1.2 and your server require TLS 1.2, you end up in with a with a connection error.

image

Such a thing can be maddening for the user.

Therefor, RavenDB will actually allow Tls and Tls11 connections, but instead of processing the request, it will give you an error that give you something to work with.

image

Updated: I forgot to actually read the message. The reason you are getting the error about no certificate is because there isn’t a certificate here. In order for this to work, we need to actually pass the certificate, in which case we’ll get the appropriate error. I apologize for the error handling, but PowerShell:
image

Armed with this information, you can now do a simple web search and realize that you actually need to do this:

image

And that saves us a lot of TCP level debugging. It took a bit of time to set this (and the other) errors properly, and they are exactly the kind of things that will save you hours or days of frustration, but you’ll never realize that they were there even if you run into them unless you know the amount of effort that went into setting this up.

The bare minimum a distributed system developer should know aboutBinding to IP addresses

time to read 3 min | 543 words

It is easy to think about a service that listen to the network as just that, it listens to the network. In practice, this is often quite a bit more complex than that.

For example, what happens when I’m doing something like this?

image

In this case, we are setting up a web server with binding to the local machine name. But that isn’t actually how it works.

At the TCP level, there is no such thing as machine name. So how can this even work?

Here is what is going on. When we specify a server URL in this manner, we are actually doing something like this:

image

And then the server is going to bind to each and every one of them. Here is an interesting tidbit:

image

What this means is that this service doesn’t have a single entry point, you can reach it through multiple distinct IP addresses.

But why would my machine have so may IP addresses? Well, let us take a look. It looks like this machine has quite a few network adapters:

image

I got a bunch of virtual ones for Docker and VMs, and then the Wi-Fi (writing on my laptop) and wired network.

Each one of these represent a way to bind to the network. In fact, there are also over 16 million additional IP addresses that I’m not showing, the entire 127.x.x.x range. (You probably know that 127.0.0.1 is loopback, right? But so it 127.127.127.127, etc.).

All of this is not really that interesting, until you realize that this has real world implications for you. Consider a server that has multiple network cards, such as this one:

image

What we have here is a server that has two totally separate network cards. One to talk to the outside world and one to talk to the internal network.

When is this useful? In pretty much every single cloud provider you’ll have very different networks. On Amazon, the internal network gives you effectively free bandwidth, while you pay for the external one. And that is leaving aside the security implications

It is also common to have different things bound to different interfaces. Your admin API endpoint isn’t even listening to the public internet, for example, it will only process packets coming from the internal network. That adds a bit more security and isolation (you still need encryption, authentication, etc of course).

Another deployment mode (which has gone out of fashion) was to hook both network cards to the public internet, using different routes. This way, if one went down, you could still respond to requests, and usually you could also handle more traffic. This was in the days where the network was often the bottleneck, but nowadays I think we have enough network bandwidth that program efficiency is of more importance and this practice somewhat fell out of favor.

The best features are the ones you never knew were thereProtocol fix-ups

time to read 4 min | 755 words

imageRavenDB uses HTTP for most of its communication. It can be used in unsecured mode, using HTTP or in secured mode, using HTTPS. So far, this is pretty standard. Let us look at a couple of URLs:

  • http://github.com
  • https://github.com

If you try to go to github using HTTP, it will redirect you to the HTTPS site. It is very easy to do, because the URLs above are actually:

  • http://github.com:80
  • https://github.com:443

In other words, by default when you are using HTTP, you’ll use port 80, while HTTPS will default to port 443. This means that the server in port 80 can just read the response and redirect you immediately to the HTTPS endpoint.

RavenDB, however, it usually used in environments where you will explicitly specify a port. So the URL would look something like this:

  • http://a.orders.raven.local:8080
  • https://a.orders.raven.local:8080

It is very common for our users to start running with port 8080 in an unsecured mode, then later move to a secure mode with HTTPS but retain the same port. That can lead to some complications. For example, here is what happens in a similar situation if I’m trying to connect to an HTTPS endpoint using HTTP or vice versa.

image

image

This means that a common scenario (running on a non native port and using the wrong protocol) will lead to a nasty error. We call this a nasty error because the user has no real way to figure out what the issue is from the error. In many cases, this will trigger an escalation to the network admin or support ticket. This is the kind of issue that I hate, it is plainly obvious, but it is so hard to figure out and then you feel stupid for not realizing this upfront.

Let us see how we can resolve such an issue. I already gave some hints on how to do it earlier, but the technique in that  post wasn’t suitable for production use in our codebase. In particular, we introduced another Stream wrapping instance and another allocation that would affect all input / output calls over the network. We would really want to avoid that.

So we cheat (but we do that a lot, so this is fine). Kestrel allow us to define connection adapters, which give us a hook very early in the process to how the TCP connection is managed. However, that lead to another problem. We want to sniff the first byte of the raw TCP request, but Stream doesn’t provide a way to Peek at a byte, any such attempt will consume it, which will result in the same problem on an additional indirection that we wanted to avoid.

Therefor, we decided to take advantage of the way Kestrel is handling things. It is buffering data in memory and if you dig a bit you can access that in some very useful ways. Here is how we are able to sniff HTTP vs. HTTPS:

The key here is that we use a bit of reflection emit magic to get the inner IPipeReader instance from Kestrel. We have to do it this way because that value isn’t exposed externally. Once we do have the pipe reader instance, we borrow the already read buffer and inspect it, if the first character is a capital character (G from GET, P from PUT, etc), this is an HTTP connection (SSL connection’s first byte is either 22 or greater than 127, so there is no overlap). We then return the buffer to the stream and carry on, Kestrel will parse the request normally, but another portion in the pipeline will get the wrong protocol message and throw that to the user. And obviously we’ll skip doing the SSL negotiation.

This is important, because the client is speaking HTTP, and we can’t magically upgrade it to HTTPS without causing errors such as the one above. We need to speak the same protocol as the client expect.

With this code, trying to use the wrong protocol give us this error:

image

Now, if you are not reading the error message that might still mean a support call, but it should be resolved as soon as someone actually read the error message.

The bare minimum a distributed system developer should know aboutHTTPS Negotiation

time to read 3 min | 586 words

I mentioned in a previous post that an SSL connection will typically use a Server Name Indication in the initial (unencrypted) packet to let the server know  which address it is interested in. This allow the server to do things such as select the appropriate certificate to answer this initial challenge.

A more interesting scenario is when you want to force your users to always use HTTPS. That is pretty trivial, you setup a website to listen on port 80 and port 443 and redirect all HTTP traffic from port 80 to port 443 as HTTPS. Pretty much any web server under the sun already have some sort of easy to use configuration for that that. Let us see how this will look like if we were writing this using bare bones Kestrel.

This is pretty easy, right? We setup a connection adapter on port 80, so we can detect that this is using the wrong port and then just redirect it. Notice that there is some magic that we need to apply here. At the connection adapter, we deal with raw TCP socket, but we don’t want to mess around with that, so we just pass the decision up the chain until we get to the part that deal with HTTP and let it send the redirect.

Pretty easy, right? But about about when a user does something like this?

http://my-awesome-service:443

Note that in this case, we are using the HTTP protocol and not the HTTPS protocol. At that point, things are a mess. A client will make a request and send a TCP packet containing HTTP request data, but the server is trying to parse that as an SSL client help message. What will usually happen is that the server will look at the incoming packet, decide that this is garbage and just close the connection. That lead to some really hard to figure out errors and much forehead slapping when you figure out what the issue is.

Now, I’m sure that you’ll agree that anyone seeing a URL as listed about will be a bit suspicious. But what about these ones?

  • http://my-awesome-service:8080
  • https://my-awesome-service:8080

Unlike before, where we would probably notice that :443 is the HTTPS port and we are using HTTP, here there is no additional indication about what the problem is. So we need to try both. And if a user is getting connection dropped error when trying the connection, there is very little chance that they’ll consider switching to HTTPS. It is far more likely that they will start looking at the firewall rules.

So now, we need to do protocol sniffing and figure out what to do from there. Let us see how this will look like in code:

We read the first few bytes of the request and see if this is the start of an SSL TCP connection. If it is, we forward the call to the usual Kestrel HTTPS behavior. If it isn’t, we mark the request as must redirect and pass it, as is, to the request parsed and ready for action and then send the redirect back.

In this way, any request on port 80 will be sent to port 443 and an HTTP request on a port that listens to HTTPS will be told that it needs to switch.

One note about the code in this post. This was written at 1:30 AM as a proof of concept only. I’m pretty sure that I’m heavily abusing the connection adapter system, especially with regards to the reflection bits there.

The best features are the ones you never knew were thereCompany culture and incentive structure

time to read 4 min | 667 words

imageI introduced the notion of frictionless software in the previous post, but I wanted to dedicate some time to talk about the deeper meaning for this kind of thinking. RavenDB is an open source product. There are a lot of business models around OSS projects, and the most common ones includes charging for support and services.

Hibernating Rhinos was founded because I wanted to write code. And the way the way we structured the company is primarily to write software and the tooling around it. We provide support and consulting services, certainly, but we aren’t looking at them as the money makers. From my perspective, we want to sell people RavenDB licenses, not to have them pay us to help them do things with RavenDB.

That means that from the company perspective, support is a cost center, not a revenue center. In other words, the more support calls I have, the sadder I become.

This mesh well with my professional pride. I want to create stuff that are useful, awesome and friction free. I want our users to take what we do and blast off, not to have them double check that their support contracts are up to date and that the support lines are open. I did a lot of study around that early on, and similar to Conway’s law, the structure of the company and its culture has deep impact on the software that it produces.

With support seen as a cost center, this lead to a ripple effect on the structure of the software. It means that error message are clearer, because if you give the user a good error message, maybe with some indication of how to fix the issue, they can resolve things on their own, without having to call support. It means that configuration and tuning should be minimal and mostly self served, instead of having to open a support ticket with “what should be my configuration settings for this or that scenario”.

It also means that we want to reduce as much as possible anything that might trip users up as they setup and use our software. You can see that with the RavenDB Studio, how we spend a tremendous amount of time and effort to make information accessible and actionable for the user. Be it the overall dashboard, or the deep insight into the internals, various graphs and metrics we expose, etc. The whole idea is to make sure that the users and admins have all the information and tooling they need in order to make things works without having to call support.

Now, to be clear, we have a support hotline with 24/7 availability, because at our scale and with the kind of software that we provide, you need that. But we are able to reduce the support load by an order of magnitude with such techniques. And it means that by and far, our support, when you need it, is going to be excellent (because we don’t need to deal with a lot of low level support issues). That means that we don’t need a many tiered support system and it take very little time to actually get to an engineer that has deep familiarity with the system and how to troubleshoot it.

There are a bunch of reasons why we went this route, treating support as a necessary overhead that needs to be reduced as much as possible. Building new features is much more interesting than fielding support calls, so we do our best to develop things so we’ll not have to spend much time on support. But mostly, it is about creating a product that is well round and complete. It’s about taking pride in not only having all the bells and whistles but also taking care to ensure that things work and that the level of friction you’ll run into using our products is as low as possible.

The best features are the ones you never knew were thereComfortable shoes & friction removal

time to read 3 min | 428 words

imageWe are currently at the stage of the RavenDB release cycle where most of what we do is friction removal. Analyzing what is going on and removing friction along the way. This isn’t about performance, we are pretty much done with this for this release cycle.

Removing friction is figuring out all the myriad of ways in which users are going to use RavenDB and run into small annoyances. Things that work exactly as they should, but it can often add a tiny bump in the road toward success. In other words, not only do I want you to drop you into the pit of success, I want to make sure that you’ll get a cushioned landing.

I take pride in my work, and I think that the sand & polish stage, removing splinters and ensuring true frictionless experience is one of the most important stages in creating awesome products. It is also quite arduous one and it has very little visible impact on the product itself. If you are successful, no one will ever even know that you had done any work at all.

I was explaining this to my wife the other day and I think that I came up with a good metaphor to explain it. Think about wearing a pair of comfortable shoes. If they are truly comfortable you’ll not notice them. In fact, them being comfortable will not be anything to remark upon, it is just there. Now, turn it around and imagine a pair of shoes that are not uncomfortable.

You do notice them, and it can be quite painful. But what would you do if you are used to all shoes being painful. Take high heels as a good example. It is standard practice, I understand, to just assume that they will be painful. So if a shoe looks great but it painful to wear, many would wear it, accepting that it is painful. It is only when you wear comfortable shoes after wearing an uncomfortable ones that you can really notice.

You feel the lack of pain, where there used to be one.

Coming back to software from high fashion, these kind of features are hard and they are often unnoticed, but they jell together to create an awesome experience and a smooth, professional feeling for the product. Even if you need to look at what is going on the other side of the fence to realize how much is being done for you.

The bare minimum a distributed system developer should know aboutDNS

time to read 4 min | 737 words

DNS is used to resolve a hostname to an IP. This is something that most developers already know. What is not widely known, or at least not talked so much is the structure of the DNS network. To the right you can find the the map of root servers, at least in a historical point of view, but I’ll get to it.

If we have root servers, then we also have non root servers, and probably non root ones. In fact, the whole DNS system is based on 13 well known root servers who then delegate authority to servers who own the relevant portion of the namespace, you can see that in the diagram below. It goes down like that for pretty much forever.

File:Dns-server-hierarchy.gif

Things become a lot more interesting when you start to consider that traversing the full DNS path is fast, but it is done trillions of times per day. Because of that, there are always caching DNS servers in the middle. This is where the TTL (time to live) aspect of DNS records come into play.

A DNS is basically just a distributed database with very slow updates. The root servers allow you to reach the owner of a piece of the namespace and from that you can extract the relevant records for that namespace. All of that is backed with the premise that DNS values change rarely and that you can cache them for long durations, typically minutes at the low end and usually for days.

This means that a DNS query will most often hit a cache along the way and not have to traverse the entire path. For that matter, portions of the path are also cached. For example, the DNS route for the “.com” domain is usually cached for 48 hours. So even if you are using a new hostname, you’ll typically be able to skip the whole “let’s go to the root server” and stop at somewhere along the way.

For developers, the most common usage of DNS is when you’ll edit the “/etc/hosts” to enable some scenario (such as local development with the real URLs). But most organizations has their own DNS (if only so you’ll be able to find other machines on the organization network). This include the ability to modify the results of the public DNS, although this is mostly done at coffee shops.

I also mentioned earlier that the the map above is a historical view of how things used to be. This is where things gets really confusing. Remember when I said that a DNS is mapping a hostname to IP? Well, the common view about an IP being a pointer to a single server is actually false. Welcome to the wonderful world of IP Anycast. Using anycast, you can basically specify multiple servers with the same IP. You’ll typically route to the nearest node and you’ll usually only do that for connectionless protocols (such as DNS). This is one of the ways that the 13 root servers are actually implemented. The IPs are routed to multiple locations.

This misdirection is done by effectively laying down multiple paths to the same IP address using the low level routing protocols (a developer will rarely need to concern themselves with that, this is the realm of infrastructure and network engineers). This is how the internet usually works, you have multiple paths that you can send a packet and you’ll chose the best one. In this case, instead of all the paths terminating in a single location, they’ll each terminate in a different one, but they will behave in the same manner. This is typically only useful for UDP, since each packet in such a case may reach a totally different server, so you cannot use TCP or any connection oriented protocols.

Another really interesting aspect of DNS is that there really isn’t any limitation on the kind of answers it returns. In other words, querying “localtest.me” will give you 127.0.0.1 back, even though this is an entry that reside on the global internet, not in your own local network. There are all sort of fun games that one can play with this approach, by making a global address point to a local IP address. One of them is the possibility of issuing a SSL certificate for a local server, which isn’t expose to the internet. But that is a hack for another time.

The bare minimum a distributed system developer should know aboutCertificates

time to read 5 min | 848 words

trust-1418901_640After explaining all the ways that trust can be subverted by DNS, CA and a random wave to wipe away the writing in the sand, let us get down to actual details about what matters here.

HTTPS / SSL / TLS, whatever it is called this week, provides confidentially over the wire for the messages you are sending. What it doesn’t provide you is confidentially from knowing who you talked too. This may seem non obvious at first, because the entire communication is encrypted, so how can a 3rd party know who I’m talking about?

Well, there are two main ways. It can happen through a DNS query. If you need to go to “http://my-awesome-service”, you need to know what the IP of that is, and for that you need to do a DNS query. There are DNS systems that are encrypted, but they aren’t widely deployed and in general you can assume that people can listen to your DNS and figure out what you are doing. If you go to “that-bad-place”, it is probably visible on someone’s logs somewhere.

But the other way that someone can know who you are talking to is that you told them so. How did you do that?

Well, let’s consider one of the primary reasons we have HTTPS. a user has to validate that the hostname they used matched the hostname on the certificate. That seems pretty reasonable, right? But that single requirement pretty much invalidates the notion of confidentiality of who I’m talking to.

Consider the following steps:

  • I go to “https://my-awesome-service”
  • This is resolved to IP address 28.23.155.123
  • I’m starting an SSL connection to that IP, at port 443. Initially, of course, the connection is not encrypted, but I’ve just initiated the SSL connection.

At that point, any outside observer that can listen to the raw network traffic know what site you have visited. But how can this be? Well, at this point, the server needs to return a reply, and it needs to do that using a certificate.

Let us go with the “secure” option and say that we are simply sending over the wire “open ssl connection to 28.23.155.123”. What does this tell the listener? Well, since at this point the server doesn’t know what the client wants, it must reply with a certificate. That certificate must be the same for all such connections and the user will abort the connection if the certificate will not match the expected hostname.

What are the implications there? Well, even assuming that I don’t have a database of matching IP addresses to their hostnames (which I would most assuredly do), I can just connect myself to the remote server and get the certificate. At this point, I can just inspect the hostname from the certificate and know what site the user wanted to visit. This is somewhat mitigated by the fact that a certificate may contain multiple hostnames or even wildcards, but even that match gives me quite a lot of information about who you are talking to.

However, not sending who I want to talk to over the initial connection has a huge cost associated with it. If the server doesn’t know who you want, this means that each IP address may serve only a single hostname (otherwise we may reply with the wrong certificate. Indeed, one of the reasons HTTPS was expensive was this tying of a whole IP address for a single hostname. On the other hand, if we sent the hostname were were interested in, the server would be able to host multiple HTTPS websites on the same machine, and select the right certificate at handshake time.

There are two ways to do that, one is called SNI – Server Name Indication. Which is basically a header in the SSL connection handshake that says what the hostname is. The other is ALPN – Application Level Protocol Negotiation, which allows you to select how you want to talk to the server. This can be very useful if you want to connect to the server as one client with HTTP and on another using HTTP/2.0. That has totally different semantics, so routing based on ALPN can make things much easier.

At this point, the server can make all sorts of interesting decisions with regards to the connection. For example, based on the SNI field, it may forward the connection to another machine, either as the raw SSL stream or by stripping the SSL and sending the unencrypted data to the final destination. The first case, of forwarding the raw SSL stream is the more interesting scenario, because we can do that without having the certificate. We just need to inspect the raw stream header and extract the SNI value, at which point we route that to the right location and send the connection on its merry way.

I might do more posts like this, but I would really appreciate feedback. Both on whatever the content is good and what additional topics would you like me to cover?

FUTURE POSTS

  1. The best features are the ones you never knew were there: You can’t do everything - 3 days from now
  2. You are doing it REALLY wrong, the shortest code review ever - 4 days from now
  3. Carefully performing invalid operations to get the wrong error and the right result - 5 days from now
  4. If you have a finalizer, watch your ctor - 6 days from now
  5. Production postmortem: The random high CPU - 7 days from now

And 7 more posts are pending...

There are posts all the way to Dec 12, 2017

RECENT SERIES

  1. PR Review (9):
    08 Nov 2017 - Encapsulation stops at the assembly boundary
  2. API Design (9):
    27 Jul 2016 - robust error handling and recovery
  3. Production postmortem (21):
    07 Aug 2017 - 30% boost with a single line change
  4. The best features are the ones you never knew were there (5):
    21 Nov 2017 - Unsecured SSL/TLS
  5. RavenDB Setup (2):
    23 Nov 2017 - How the automatic setup works
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats