Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 241 words

imageIn the previous posts in this series, I talked about the kind of features that we build into RavenDB. Things that you never even notice making your life easier.

One feature we don’t have is doing HTTPS to HTTP downgrade. What do I mean by that? Assume that you have a RavenDB instance that is running using HTTP, and a client attempts to connect to it using HTTPS. Remember that we are assuming that the access it made on the same port. So the client wrote "https://my.raven.database:8080” instead of “http://my.raven.database:8080”.

If the other thing would happen, we would detect that and give a clear error to the user. But the other way around? We don’t do that, but why?

Well, the reasoning is very simple. If you connect to an HTTP endpoint using HTTPS, the first packet on the wire wants to do SSL negotiation. However, we don’t have a certificate that we can use here, so we can’t even start the negotiation process.

We could try generating a self signed certificate on the fly and answer the request with an error. But at this point, the client will likely already error at a low level because of the self signed certificate not being trusted.

Another point against implementing this feature is that HTTP endpoints typically become HTTPS, but rarely the other way around.

time to read 3 min | 598 words

imageI wish I would have been sufficient to use HTTPS for security. With RavenDB 4.0’s move toward TLS as the security mechanism for encryption of data over the wire and authentication using x509 we had to learn way too much about how Transport Layer Security works.

In particular, it can be quite annoying when you realize that just because you use SSL (or more accurately, TLS) that isn’t sufficient. You need to use the proper version, and there are interoperability issues. Many of RavenDB’s users run it in an environments that are subject to strict scrutiny and high level of regulation and oversight. That means that we need to make sure that we are able to operate in such environment. One option would be to use something like a FIPS configuration. We have a “normal”configuration and one that is aimed at people that need stricter standards. For many reasons, this is a really bad idea. Not least of all is the problem that even if you don’t have to meet FIMS mandate, you still want to be secured. Amusingly enough, many FIPS certified stacks are actually less secured (because they can’t get patches to the certified binaries).

So the two options mode was rejected. That meant that we should run in a mode that is can be match the requirements of the most common deployment regulations. In particular interest to us is PCI compliance, since we are often deployed in situations that involve money and payment processing.

That can be a bit of a problem. PCI requires that your communication will use TLS, obviously. But it also requires it to use TLS 1.2. That is great and with .NET it is easily supported. However, not all the tools are aware of this. This put us back in the same state as with HTTP vs. HTTPS. If your client does not support TLS 1.2 and your server require TLS 1.2, you end up in with a with a connection error.

image

Such a thing can be maddening for the user.

Therefor, RavenDB will actually allow Tls and Tls11 connections, but instead of processing the request, it will give you an error that give you something to work with.

image

Updated: I forgot to actually read the message. The reason you are getting the error about no certificate is because there isn’t a certificate here. In order for this to work, we need to actually pass the certificate, in which case we’ll get the appropriate error. I apologize for the error handling, but PowerShell:
image

Armed with this information, you can now do a simple web search and realize that you actually need to do this:

image

And that saves us a lot of TCP level debugging. It took a bit of time to set this (and the other) errors properly, and they are exactly the kind of things that will save you hours or days of frustration, but you’ll never realize that they were there even if you run into them unless you know the amount of effort that went into setting this up.

time to read 4 min | 755 words

imageRavenDB uses HTTP for most of its communication. It can be used in unsecured mode, using HTTP or in secured mode, using HTTPS. So far, this is pretty standard. Let us look at a couple of URLs:

  • http://github.com
  • https://github.com

If you try to go to github using HTTP, it will redirect you to the HTTPS site. It is very easy to do, because the URLs above are actually:

  • http://github.com:80
  • https://github.com:443

In other words, by default when you are using HTTP, you’ll use port 80, while HTTPS will default to port 443. This means that the server in port 80 can just read the response and redirect you immediately to the HTTPS endpoint.

RavenDB, however, it usually used in environments where you will explicitly specify a port. So the URL would look something like this:

  • http://a.orders.raven.local:8080
  • https://a.orders.raven.local:8080

It is very common for our users to start running with port 8080 in an unsecured mode, then later move to a secure mode with HTTPS but retain the same port. That can lead to some complications. For example, here is what happens in a similar situation if I’m trying to connect to an HTTPS endpoint using HTTP or vice versa.

image

image

This means that a common scenario (running on a non native port and using the wrong protocol) will lead to a nasty error. We call this a nasty error because the user has no real way to figure out what the issue is from the error. In many cases, this will trigger an escalation to the network admin or support ticket. This is the kind of issue that I hate, it is plainly obvious, but it is so hard to figure out and then you feel stupid for not realizing this upfront.

Let us see how we can resolve such an issue. I already gave some hints on how to do it earlier, but the technique in that  post wasn’t suitable for production use in our codebase. In particular, we introduced another Stream wrapping instance and another allocation that would affect all input / output calls over the network. We would really want to avoid that.

So we cheat (but we do that a lot, so this is fine). Kestrel allow us to define connection adapters, which give us a hook very early in the process to how the TCP connection is managed. However, that lead to another problem. We want to sniff the first byte of the raw TCP request, but Stream doesn’t provide a way to Peek at a byte, any such attempt will consume it, which will result in the same problem on an additional indirection that we wanted to avoid.

Therefor, we decided to take advantage of the way Kestrel is handling things. It is buffering data in memory and if you dig a bit you can access that in some very useful ways. Here is how we are able to sniff HTTP vs. HTTPS:

The key here is that we use a bit of reflection emit magic to get the inner IPipeReader instance from Kestrel. We have to do it this way because that value isn’t exposed externally. Once we do have the pipe reader instance, we borrow the already read buffer and inspect it, if the first character is a capital character (G from GET, P from PUT, etc), this is an HTTP connection (SSL connection’s first byte is either 22 or greater than 127, so there is no overlap). We then return the buffer to the stream and carry on, Kestrel will parse the request normally, but another portion in the pipeline will get the wrong protocol message and throw that to the user. And obviously we’ll skip doing the SSL negotiation.

This is important, because the client is speaking HTTP, and we can’t magically upgrade it to HTTPS without causing errors such as the one above. We need to speak the same protocol as the client expect.

With this code, trying to use the wrong protocol give us this error:

image

Now, if you are not reading the error message that might still mean a support call, but it should be resolved as soon as someone actually read the error message.

time to read 4 min | 667 words

imageI introduced the notion of frictionless software in the previous post, but I wanted to dedicate some time to talk about the deeper meaning for this kind of thinking. RavenDB is an open source product. There are a lot of business models around OSS projects, and the most common ones includes charging for support and services.

Hibernating Rhinos was founded because I wanted to write code. And the way the way we structured the company is primarily to write software and the tooling around it. We provide support and consulting services, certainly, but we aren’t looking at them as the money makers. From my perspective, we want to sell people RavenDB licenses, not to have them pay us to help them do things with RavenDB.

That means that from the company perspective, support is a cost center, not a revenue center. In other words, the more support calls I have, the sadder I become.

This mesh well with my professional pride. I want to create stuff that are useful, awesome and friction free. I want our users to take what we do and blast off, not to have them double check that their support contracts are up to date and that the support lines are open. I did a lot of study around that early on, and similar to Conway’s law, the structure of the company and its culture has deep impact on the software that it produces.

With support seen as a cost center, this lead to a ripple effect on the structure of the software. It means that error message are clearer, because if you give the user a good error message, maybe with some indication of how to fix the issue, they can resolve things on their own, without having to call support. It means that configuration and tuning should be minimal and mostly self served, instead of having to open a support ticket with “what should be my configuration settings for this or that scenario”.

It also means that we want to reduce as much as possible anything that might trip users up as they setup and use our software. You can see that with the RavenDB Studio, how we spend a tremendous amount of time and effort to make information accessible and actionable for the user. Be it the overall dashboard, or the deep insight into the internals, various graphs and metrics we expose, etc. The whole idea is to make sure that the users and admins have all the information and tooling they need in order to make things works without having to call support.

Now, to be clear, we have a support hotline with 24/7 availability, because at our scale and with the kind of software that we provide, you need that. But we are able to reduce the support load by an order of magnitude with such techniques. And it means that by and far, our support, when you need it, is going to be excellent (because we don’t need to deal with a lot of low level support issues). That means that we don’t need a many tiered support system and it take very little time to actually get to an engineer that has deep familiarity with the system and how to troubleshoot it.

There are a bunch of reasons why we went this route, treating support as a necessary overhead that needs to be reduced as much as possible. Building new features is much more interesting than fielding support calls, so we do our best to develop things so we’ll not have to spend much time on support. But mostly, it is about creating a product that is well round and complete. It’s about taking pride in not only having all the bells and whistles but also taking care to ensure that things work and that the level of friction you’ll run into using our products is as low as possible.

time to read 3 min | 428 words

imageWe are currently at the stage of the RavenDB release cycle where most of what we do is friction removal. Analyzing what is going on and removing friction along the way. This isn’t about performance, we are pretty much done with this for this release cycle.

Removing friction is figuring out all the myriad of ways in which users are going to use RavenDB and run into small annoyances. Things that work exactly as they should, but it can often add a tiny bump in the road toward success. In other words, not only do I want you to drop you into the pit of success, I want to make sure that you’ll get a cushioned landing.

I take pride in my work, and I think that the sand & polish stage, removing splinters and ensuring true frictionless experience is one of the most important stages in creating awesome products. It is also quite arduous one and it has very little visible impact on the product itself. If you are successful, no one will ever even know that you had done any work at all.

I was explaining this to my wife the other day and I think that I came up with a good metaphor to explain it. Think about wearing a pair of comfortable shoes. If they are truly comfortable you’ll not notice them. In fact, them being comfortable will not be anything to remark upon, it is just there. Now, turn it around and imagine a pair of shoes that are not uncomfortable.

You do notice them, and it can be quite painful. But what would you do if you are used to all shoes being painful. Take high heels as a good example. It is standard practice, I understand, to just assume that they will be painful. So if a shoe looks great but it painful to wear, many would wear it, accepting that it is painful. It is only when you wear comfortable shoes after wearing an uncomfortable ones that you can really notice.

You feel the lack of pain, where there used to be one.

Coming back to software from high fashion, these kind of features are hard and they are often unnoticed, but they jell together to create an awesome experience and a smooth, professional feeling for the product. Even if you need to look at what is going on the other side of the fence to realize how much is being done for you.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}