Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,522 | Comments: 47,973

filter by tags archive

Queries++ in RavenDBFacets of information

time to read 3 min | 536 words

image

RavenDB has a lot of functionality that is available just underneath the surface. In addition to just finding documents, you can use RavenDB to find a lot more about what is going on in your database. This series of posts is aimed at exposing some of the more fun things that you can do with RavenDB that you are probably not aware of.

One of the those things is the idea of not just querying for information, but also querying for the facets of the results. This can be very useful if you are likely to search for something that would return a lot of results and you want to quickly filter these out without having the user do a lot of trial an error. This is one of those cases where it is much easier to explain what is going on with a picture.

Imagine that you are searching for a phone. You might have a good idea what you are looking for a phone on eBay. I just did that and it gave me over 300 thousands results. The problem is that if I actually want to buy one of them, I’m not going to scroll through however many pages of product listings. I need a way to quickly narrow down the selection, and facets allow me to do that, as you can see in the image. Each of these is a facet and I can filter out things so only the stuff that I’m interested in will be shown, allowing me to quickly make a decision (and purchase).

Using the sample dataset in RavenDB, we’ll explore how we can run faceted searches in RavenDB. First, we’ll define the “Products/Search” index:

Using this index, we can now ask RavenDB to give us the facets from this dataset, like so:

image

This will give us the following results:

image

And we can inspect each of them in turn:

image     image

These are easy, because they give us the count of matching products for each category and supplier. Of more interest to us is the Prices facet.

image

And here we can see how we sliced and diced the results. We can narrow things further with the user’s choices, of course, let’s check out this query:

image

Which gives us the following Prices facet:

image

This means that you can, in a very short order, produce really cool search behavior for your users.

Production postmortemdata corruption, a view from INSIDE the sausage

time to read 10 min | 1883 words

In terms of severity, there are very few things that we take more seriously than data integrity. In fact, the only thing that pops to mind as higher priority are security issues. A user reported an error when using a pre-release 4.0 database that certainly looked like data corruption, so we were very concerned when we go the report, and quite happy about the actual error. If this is strange, let me explain.

Storage bugs are nasty. I suggest reading this article to understand how tricky these can be. The article talks about memory allocators (even though it calls them storage) but the same rules apply. And the most important rule from this article?

WHEN A MEMORY DAMAGE BUG IS OBSERVED, IT TAKES PRIORITY OVER ALL OTHER BUG FIXES, ENHANCEMENTS, OR ANY OTHER DEVELOPMENT ACTIVITY.  ALL DEVELOPMENT CEASES UNTIL IT IS FOUND.

You can read the article for the full reasoning why, but basically is about being able to  reproduce and fix the bug and not make it “go away” with a hammer approach. We do the same with data corruption. One of our developers stops doing anything else and investigate just that, as a top priority issue. Because we take this so seriously, we have built several layers of defense in depth into RavenDB.

All the data is signed and we compare hashed when reading from disk to validate that it hasn’t been modified. This also help us catch an enormous amount of problems with storage devices and react to them early. There are other checks that are being run to verify the integrity of the system, from debug asserts to walking the structure of the data and verifying its correctness.

In this case, analysis of the data the user provided showed that we were failing the hash validation, which should usually only happen if there is a physical file corruption. While we were rooting for that (since this would mean no issues with our code), we also looked into the error in detail. What we found was that we were somehow starting to read a document from the middle, instead of the beginning. Somehow we managed to mess up the document offset and that caused us to think that the document was corrupted.

At this point, we had a confirmed data corruption issue, since obviously we shouldn’t lose track of where we put the documents. We pulled another developer into this, to try to reproduce the behavior independently while checking if would salvage the user’s data from the corrupted files. This deserve some explanation. We don’t assume that our software is perfect, so we took steps in advanced. The hashing the data and validating it is one such step, but another is build, upfront, the recovery tools for when the inevitable happens. That meant that the way we lay out the data on disk was designed, upfront and deliberately, to allow us to recover the data in the case of corruption.

Admittedly, I was mostly thinking about corruption of the data as a result of physical failure, but the way we lay out the data on disk also protect us from errors in record keeping such as this one. This meant that we were able to extract the data out and recover everything for the user.

At this time, we had a few people trying to analyze the issue and attempting to reproduce it. The problem with trying to figure out this sort of issue from the resulting file is that by the time you have found the error, this is too late, the data is already corrupted and you have been operating in a silent bad state for a while, until it finally got to the point this become visible.

8011098[3]We had the first break in the investigation when we managed to reproduce this issue locally on a new database. That was great, because it allowed us to rule out some possible issues related to upgrading from an earlier version, which was one of the directions we looked at. The bad part was that this was reproduced mostly by the developer in question repeatedly hitting the keyboard with his head in frustration. So we didn’t have a known way to reproduce this.

Yes, I know that animated GIFs are annoying, so was this bug, I need a way to share the pain. At one point we got something that could reliably generate an error, it was on the 213th write to the system. Didn’t matter what write, but the 213th write will always produce an error. There is nothing magical about 213, by the way, I remember this value because we tried so very hard to figure out what was magical about it.

At this point we had four or five developers working on this (we needed a lot of heads banging on keyboards to reproduce this error). The code has been analyzed over and over. We found a few places where we could have detected the data corruption earlier, because it violated invariants and we didn’t check for that. That was the first real break we had. Because that allowed us to catch the error earlier, which let to less head banging before the problem could be reproduced. The problem was that we always caught it too late, we kept going backward in the code, each time really excited that we are going to be able to figure out what was going on there and realizing that the invariants this code relied on were already broken.

Because these are invariants, we didn’t check them, they couldn’t possibly be broken. That sounds bad, because obviously you need to validate your input and output, right? Allow me to demonstrate a sample of a very simple storage system:

 

There isn’t anything wrong with the code here at first glance, but look at the Remove method, and now at this code that uses the storage:

The problem we have here is not  with the code in the Remove or the GetUserEmail method, instead, the problem is that the caller did something that it wasn’t supposed to, and we proceeded on the assumption that everything is okay.

The end result is that the _byName index contained a reference to a deleted document, and calling GetUserEmail will throw a null reference exception. The user visible problem is the exception, but the problem was actually caused much earlier. The invariant that we violating could have been caught in the Remove method, though, if we did something like this:

These sort of changes allow us to get earlier and earlier to the original location where the problem first occurred. Eventually we were able to figure out that a particular pattern of writes would put the internal index inside RavenDB into a funny state, in particular, here is how this looks like from the inside.

image

What you see here is the internal structure of the tree inside RavenDB used to map between documents etags and their location on the disk. In this case, we managed to get into a case where we would be deleting the last item from a page that is the leftmost page in a tree that has 3 or more levels and whose parent is the rightmost page in the grandparent and is less than 25% full while the sibling to its left is completely full.

In this case, during rebalancing operation, we were forgetting to reset the downward references and ended up messing up the sort order of the tree. That worked fine, most of the time, but it would slowly poison our behavior, as we made binary searches on data that was supposed to be sorted but wasn’t.

Timeline (note, despite the title, this is pre released software and this is not a production system, the timeline reflects this):

  • T-9 day, first notice of this issue in the mailing list. Database size exceed 400GB. Back and forth with the user on figuring out exactly what is going on, validating the issue is indeed corruption and getting the data.
  • T-6 days, we start detailed analysis of the data in parallel to verifying that we can recover the data.
  • T-5 days, user has the data back and can resume working normally, investigation proceeds.
  • T-4 days, we have managed to reproduced this on our own system, no idea how yet.
  • T-3 days, head banging on keyboards, adding invariants validations and checks everywhere we can think of.
  • T-2 days, managed to trap the root cause of the issue, tests added, pruning investigation code for inclusion in product for earlier detection of faults.
  • Issue fixed
  • T – this blog post is written Smile.
  • T + 3 days, code for detecting this error and automatically resolving this is added to the next release.

For reference, here is the fix:

image

The last change in the area in question happened two years ago, by your truly, so this is a pretty stable part of the code.

In retrospect, there are few really good things that we learned from this.

  • In a real world situation, we were able to use the recovery tools we built and get the user back up in a short amount of time. We also found several issues with the recovery tool itself, mostly the fact that its default logging format was verbose, which on a 400GB database means an enormous amount of logs that slowed down the process.
  • No data was lost, and these kinds of issues wouldn’t be able to cross a machine boundary so a second replica would have been able to proceed.
  • Early error detection was able to find the issue, investment with hashing and validating the data paid off handsomely here. More work was done around making the code more paranoid, not for the things that it is supposed to be responsible for but to ensure that other pieces of the code are not violating invariants.
  • The use of internal debug and visualization tools (such as the one above, showing the structure of the internal low level tree) was really helpful with resolving the issue.
  • We focused too much on the actual error that we got from the system (the hash check that failed), one of the things we should have done is to verify the integrity of the whole database at the start, which would have led us to figure out what the problem was much earlier. Instead, we suspected the wrong root cause all along all the way to the end. We assumed that the issue was because of modifications to the size of the documents, increasing and decreasing them in a particular pattern to cause a specific fragmentation issue that was the root cause of the failure. It wasn’t, but we were misled about it for a while because that was the way we were able to reproduce this eventually. It turned out that the pattern of writes (to which documents) was critical here, not the size of the documents.

Overall, we spent over a lot of time on figuring out what the problem was and the fix was two lines of code. I wrote this post independently of this investigation, but it hit the nail straight on.

Production postmortemThe random high CPU

time to read 2 min | 253 words

A customer complained that every now and then RavenDB is hitting 100% CPU and stays there. They were kind enough to provide a minidump, and I started the investigation.

I loaded the minidump to WinDB and started debugging. The first thing you do with high CPU is rung the “!runaway” command, which sorts the threads by how busy they are:

image

I switched to the first thread (39) and asked for its stack, I highlighted the interesting parts:

image

This is enough to have a strong suspicion on what is going on. I checked some of the other high CPU threads and my suspicion was confirmed, but even from this single stack trace it is enough.

Pretty much whenever you see a thread doing high CPU within the Dictionary class it means that you are accessing it in a concurrent manner. This is unsafe, and may lead to strange effects. One of them being an infinite loop.

In this case, several threads were caught in this infinite loop. The stack trace also told us where in RavenDB we are doing this, and from there we could confirm that indeed, there is a rare set of circumstances that can cause a timer to fire fast enough that the previous timer didn’t have a chance to complete, and both of these timers will modify the same dictionary, causing the issue.

RavenDB SetupHow the automatic setup works

time to read 8 min | 1456 words

imageOne of the coolest features in the RC2 release for RavenDB is the automatic setup, in particular, how we managed to get a completely automated secured setup with minimal amount of fuss on the user’s end.

You can watch the whole thing from start to finish, it takes about 3 minutes to go through the process (if you aren’t also explaining what you are doing) and you have a fully secured cluster talking to each other over secured TLS 1.2 channels.  This was made harder because we are actually running with trusted certificates. This was a hard requirement, because we use the RavenDB Studio to manage the server, and that is a web application hosted on RavenDB itself. As such, it is subject to all the usual rules of browser based applications, including scary warnings and inability to act if the certificate isn’t valid and trusted.

In many cases, this lead people to chose to use HTTP. Because at least with that model, you don’t have to deal with all the hassle. Consider the problem. Unlike a website, that has (at least conceptually) a single deployment, RavenDB is actually deployed on customer sites and is running on anything from local developer machines to cloud servers. In many cases, it is hidden behind multiple layers of firewalls, routers and internal networks. Users may chose to run it in any number of strange and wonderful configurations, and it is our job to support all of them.

In such a situation, defaulting to HTTP only make things easy. Mostly because things work. Using HTTPS require that we’ll use a certificate. We can obviously use a self signed certificate, and have the following shown to the user on the first access to the website:

image

As you can imagine, this is not going to inspire confidence with users. In fact, I can think of few other ways to ensure the shortest “download to recycle bin” path. Now, we could ask the administrator to generate a certificate an ensure that this certificate is trusted. And that would work, if we could assume that there is an administrator. I think that asking a developer that isn’t well versed in security practices to do that is likely to result in an even shorter “this is waste of my time” reaction than the unsecured warning option.

We considered the option of installing a (locally generated) root certificate and generating a certificate from that. This would work, but only on the local machine, and RavenDB is, by nature, a distributed database. So that would make for a great demo, but it would cause a great deal of hardships down the line. Exactly the kind of feature and behavior that we don’t want. And even if we generate the root certificate locally and throw it away immediately afterward, the idea still bothered me greatly, so that was something that we considered only in times of great depression.

So, to sum it all up, we need a way to generate a valid certificate for a random server, likely running in a protected network, inaccessible from the outside (as  in, pretty much all corporate / home networks these days). We need to do without requiring the user to do things like setup dynamic DNS, port forwarding in router or generating their own certificates. We also need to to be fast enough that we can do that as part of the setup process. Anything that would require a few hours / days is out of the question.

We looked into what it would take to generate our own trusted SSL certificates. This is actually easily possible, but the cost is prohibitive, given that we wanted to allow this for free users as well, and all the options we got always had a per generated certificate cost associated with it.

Let’s Encrypt is the answer for HTTPS certificate generation on the public web, but the vast majority all of our deployments are likely to be inside the firewall, so we can’t verify a certificate using Let’s Encrypt. Furthermore, doing so will require users to define and manage DNS settings as part of the deployment of RavenDB. That is something that we wanted to avoid.

This might require some explanation. The setup process that I’m talking about is not just to setup a production instance. We consider any installation of RavenDB to be worth a production grade setup. This is a lesson from the database ransomware tales. I see no reason why we should learn this lesson again on the backs of our users, so a high priority was given to making sure that the default install mode is also the secure and proper one.

All the options that are ruled out in this post (provide your own certificate, setup DNS, etc) are entirely possible (and quite easily) with RavenDB, if an admin so chose, and we expect that many will want to setup RavenDB in a manner that fits their organization policies. But here we are talkingh about the base line (yes, dear) install and we want to make it as simple and straightforward as we possibly can.


There is another problem with Let’s Encrypt for our situation, we need to generate a lot of certificates, significantly more than the default rate limit that Let’s Encrypt provides. Luckily, they provide a way to request an extension to this rate limit, which is exactly what we did. Once this was granted, we were almost there.

imageThe way RavenDB generates certificates as part of the setup process is a bit involved. We can’t just generate any old hostname, we need to provide proof to Let’s Encrypt that we own the hostname in question. For that matter, who is the we in question? I don’t want to be exposed to all the certificates that are generated for the RavenDB instances out there. That is not a good way to handle security.

The key for the whole operation is the following domain name: dbs.local.ravendb.net

During setup, the user will register a subdomain under that, such as arava.dbs.local.ravendb.net. We ensure that only a single user can claim each domain. Once they have done that, they let RavenDB what IP address they want to run on. This can be a public IP, exposed on the internet, a private one (such as 192.168.0.28) or even a loopback device (127.0.0.1).

The local server, running on the user’s machine then initiates a challenge to Let’s Encrypt for the hostname in question. With the answer to the challenge, the local server then call to api.ravendb.net. This is our own service, running on the cloud. The purpose of this service is to validate that the user “owns” the domain in question and to update the DNS records to match the Let’s Encrypt challenge.

The local server can then go to Let’s Encrypt and ask them to complete the process and generate the certificate for the server. At no point do we need to have the certificate go through our own servers, it is all handled on the client machine. There is another thing that is happening here. Alongside the DNS challenge, we also update the domain the user chose to point to the IP they are going to be hosted at. This means that the global DNS network will point to your database. This is important, because we need the hostname that you’ll use to talk to RavenDB to match the hostname on the certificate.

Obviously, RavenDB will also make sure to refresh the Let’s Encrypt certificate on a timely basis.

The entire process is seamless and quite amazing when you see it. Especially because even developers might not realize just how much goes on under the cover and how much pain was taken away from them.

We run into a few issues along the way and Let’s Encrypt support has been quite wonderful in this regard, including deploying a code fix that allowed us to make the time for RC2 with the full feature in place.

There are still issues if you are running on a completely isolated network, and some DNS configurations can cause issues, but we typically detect and give a good warning about that (allowing you to switch to 8.8.8.8 as a good workaround for most such issues). The important thing is that we achieve the main goal, seamless and easy setup with the highest level of security.

RavenDB Setupa secured cluster in 10 minutes or less

time to read 1 min | 86 words

One of the major features of the RC2 release for RavenDB has been the setup process. In particular, we worked on making sure that the default and easiest manner to install RavenDB will be the one with the highest level of security.

I’m excited enough by this feature that I recorded myself setting up a full blown cluster, including everything you need for production deployment in under 10 minutes, with a lot of my explanations in the middle. Take a look.

The best features are the ones you never knew were thereUnsecured SSL/TLS

time to read 3 min | 598 words

imageI wish I would have been sufficient to use HTTPS for security. With RavenDB 4.0’s move toward TLS as the security mechanism for encryption of data over the wire and authentication using x509 we had to learn way too much about how Transport Layer Security works.

In particular, it can be quite annoying when you realize that just because you use SSL (or more accurately, TLS) that isn’t sufficient. You need to use the proper version, and there are interoperability issues. Many of RavenDB’s users run it in an environments that are subject to strict scrutiny and high level of regulation and oversight. That means that we need to make sure that we are able to operate in such environment. One option would be to use something like a FIPS configuration. We have a “normal”configuration and one that is aimed at people that need stricter standards. For many reasons, this is a really bad idea. Not least of all is the problem that even if you don’t have to meet FIMS mandate, you still want to be secured. Amusingly enough, many FIPS certified stacks are actually less secured (because they can’t get patches to the certified binaries).

So the two options mode was rejected. That meant that we should run in a mode that is can be match the requirements of the most common deployment regulations. In particular interest to us is PCI compliance, since we are often deployed in situations that involve money and payment processing.

That can be a bit of a problem. PCI requires that your communication will use TLS, obviously. But it also requires it to use TLS 1.2. That is great and with .NET it is easily supported. However, not all the tools are aware of this. This put us back in the same state as with HTTP vs. HTTPS. If your client does not support TLS 1.2 and your server require TLS 1.2, you end up in with a with a connection error.

image

Such a thing can be maddening for the user.

Therefor, RavenDB will actually allow Tls and Tls11 connections, but instead of processing the request, it will give you an error that give you something to work with.

image

Updated: I forgot to actually read the message. The reason you are getting the error about no certificate is because there isn’t a certificate here. In order for this to work, we need to actually pass the certificate, in which case we’ll get the appropriate error. I apologize for the error handling, but PowerShell:
image

Armed with this information, you can now do a simple web search and realize that you actually need to do this:

image

And that saves us a lot of TCP level debugging. It took a bit of time to set this (and the other) errors properly, and they are exactly the kind of things that will save you hours or days of frustration, but you’ll never realize that they were there even if you run into them unless you know the amount of effort that went into setting this up.

RavenDB 4.0 Release Candidate 2 is out

time to read 5 min | 867 words

image_thumbIf has been two months since the first release candidate of RavenDB 4.0 and the team has been hard at work. Looking at the issues resolved in that time frame, there are over 500 of them, and I couldn’t be happier about the result.

RavenDB 4.0 RC2 is out: Get it here (Windows, Linux, OSX, Raspberry PI, Docker).

When we were going through the list of issues for this released, I noticed something really encouraging. The vast majority of them were things that would never make it into the release highlights. These are the kind of issues that are all about spit and polish. Anything from giving better error messages to improving the first few minutes of your setup and installation to just getting things done. This is a really good thing at this stage in the release cycle.  We are done with features and big ticket stuff. Now it is time to finishing grinding through all the myriads of details and small fixes that make a product really shine.

That said, there are still a bunch of really cool stuff that were cooking for a long time and that we could only now really call complete. This list includes:

  • Authentication and authorization – the foundation for that was laid a long time ago, with X509 client certificates used for authenticating clients against RavenDB 4.0 servers. The past few months had us building the user interface to manage these certificates, define permissions and access across the cluster.
  • Facet and MoreLikeThis queries – this is a feature that was available in RavenDB for quite some time and is now available as a integral part of the RavenDB Query Language. I’m going to have separate posts to discuss these, but they are pretty cool, albeit specialized ways to look at your data.
  • RQL improvements – we made RQL a lot smarter, allowing more complex queries and projections. Spatial support has been improved and is now much easier to work with and reason about using just raw RQL queries.
  • Server dashboardallows you to see exactly what your servers are doing and is meant to be something that the ops team can just hang on the wall and stare at in amazement realizing how much the database can do.
  • Operations – the operations team generally has a lot of new things to look at in this release. SNMP monitoring is back, and significant amount of work was spent on errors. That is, making sure that an admin will have clear and easy to understand errors and a path to fix them. Traffic monitoring and live tracing of logs is also available directly in the studio now. CSV import / export is also available in the studio, as well as Excel integration for the business people. Automatic backup processes are also available now for scheduled backups for both local and cloud targets and an admin has more options to control the database. This include compaction of databases after large deletes to restore space to the sytem.
  • Patching, querying and expiring UI  – this was mostly exposing existing functionality and improving the amount of details that we provide by default. Allowing users to define auto expiration policy for documents with time to live. On the querying side, we are showing a lot more information. My favorite feature there is that the studio can now show the result of including documents, which allow to easily show how this feature can save you in network roundtrips.  Queries & patching now has much much nicer UI and also support some really cool intellisense.
  • Performance – most of the performance work was already done, but we were able to identify some bottlenecks on the client side and reduce the amount of work it takes to save data to the database significantly. This especially affects bulk inert operations, but the effect is actually wide spread enough to impact most of the client operations.
  • Advanced Linq support – a lot of work has been put into the Linq provider (again) to enable more advanced scenarios and more complex queries.
  • ETL Processes -  are now exposed and allow you to define both RavenDB and SQL databases as target for automatic ETL from a RavenDB instance.
  • Cluster wide atomic operations – dubbed cmpxchng after the similar assembly instruction, this basic building block allow to build very complex distributed behaviors in a distributed environment without any hassle, relying on RavenDB consensus to verify that such operations are truly atomics.
  • Identity support – identities are now fully supported in the client and operate as a cluster wide operation. This means that you can rely on them being unique cluster wide.

Users provided really valuable feedback, finding a lot of pitfalls and stuff that didn’t make sense or flow properly. And that was a lot of help in reducing friction and getting things flowing smoothly.

There is another major feature that we worked on during this time, the setup process. And it may sound silly, but this is probably the one that I’m most excited about in this release. Excited enough that I’ll have a whole separate post for it, coming soon.

The best features are the ones you never knew were thereProtocol fix-ups

time to read 4 min | 755 words

imageRavenDB uses HTTP for most of its communication. It can be used in unsecured mode, using HTTP or in secured mode, using HTTPS. So far, this is pretty standard. Let us look at a couple of URLs:

  • http://github.com
  • https://github.com

If you try to go to github using HTTP, it will redirect you to the HTTPS site. It is very easy to do, because the URLs above are actually:

  • http://github.com:80
  • https://github.com:443

In other words, by default when you are using HTTP, you’ll use port 80, while HTTPS will default to port 443. This means that the server in port 80 can just read the response and redirect you immediately to the HTTPS endpoint.

RavenDB, however, it usually used in environments where you will explicitly specify a port. So the URL would look something like this:

  • http://a.orders.raven.local:8080
  • https://a.orders.raven.local:8080

It is very common for our users to start running with port 8080 in an unsecured mode, then later move to a secure mode with HTTPS but retain the same port. That can lead to some complications. For example, here is what happens in a similar situation if I’m trying to connect to an HTTPS endpoint using HTTP or vice versa.

image

image

This means that a common scenario (running on a non native port and using the wrong protocol) will lead to a nasty error. We call this a nasty error because the user has no real way to figure out what the issue is from the error. In many cases, this will trigger an escalation to the network admin or support ticket. This is the kind of issue that I hate, it is plainly obvious, but it is so hard to figure out and then you feel stupid for not realizing this upfront.

Let us see how we can resolve such an issue. I already gave some hints on how to do it earlier, but the technique in that  post wasn’t suitable for production use in our codebase. In particular, we introduced another Stream wrapping instance and another allocation that would affect all input / output calls over the network. We would really want to avoid that.

So we cheat (but we do that a lot, so this is fine). Kestrel allow us to define connection adapters, which give us a hook very early in the process to how the TCP connection is managed. However, that lead to another problem. We want to sniff the first byte of the raw TCP request, but Stream doesn’t provide a way to Peek at a byte, any such attempt will consume it, which will result in the same problem on an additional indirection that we wanted to avoid.

Therefor, we decided to take advantage of the way Kestrel is handling things. It is buffering data in memory and if you dig a bit you can access that in some very useful ways. Here is how we are able to sniff HTTP vs. HTTPS:

The key here is that we use a bit of reflection emit magic to get the inner IPipeReader instance from Kestrel. We have to do it this way because that value isn’t exposed externally. Once we do have the pipe reader instance, we borrow the already read buffer and inspect it, if the first character is a capital character (G from GET, P from PUT, etc), this is an HTTP connection (SSL connection’s first byte is either 22 or greater than 127, so there is no overlap). We then return the buffer to the stream and carry on, Kestrel will parse the request normally, but another portion in the pipeline will get the wrong protocol message and throw that to the user. And obviously we’ll skip doing the SSL negotiation.

This is important, because the client is speaking HTTP, and we can’t magically upgrade it to HTTPS without causing errors such as the one above. We need to speak the same protocol as the client expect.

With this code, trying to use the wrong protocol give us this error:

image

Now, if you are not reading the error message that might still mean a support call, but it should be resolved as soon as someone actually read the error message.

Random perf results that make me happy

time to read 2 min | 212 words

Michael Yarichuk is one of the core developers of RavenDB. He is going to do a talk and a workshop on Oredev this week. And I just got his latest slides for review.

His talk is about how you can reduce your GC load and improve performance and it includes the following slide:

image

On the left you have RavenDB 4.0 and on the right RavenDB 3.5 running the same load under a profiler. Leaving aside that RavenDB 4.0 is much faster overall, look at the numbers. The 3.5 version spent a lot of time in GC, and a lot of that was blocking GC calls. The 4.0 version barely did any GC, and all of that was in the background.

This scenario wasn’t part of any performance work, it was to show the result of about two years of work and it is amazing to look back and understand that we can see a concrete example of the results so clearly.

Michael will be talking about some of the techniques we use to get there, so I highly recommend you come to his talk. He’ll also be doing a full day workshop on modeling data with documents.

RavenDB 4.0 book update is available

time to read 2 min | 388 words

imageA new update to the Inside RavenDB book is available. I’m up to chapter 9 (although Chapter 8 is just a skeleton). You can read it here.

In particular, the details about running RavenDB in a cluster and the distributed technologies and approaches it uses are now fully covered. I still have to get back to discussing ETL strategies, but there are two full chapters discussing how RavenDB clusters and replication work in detail. I would dearly appreciate any feedback on that part.

This is a complex topic, and I want to get additional eyes on this to make sure sure that it is understandable. Especially if you are new to distributed system design and how they work.

Another major advantage that we now have a professional editor go through chapter 1 – 7, so the usage of the English language probably leveled up at least twice. Errors, awkward phrasing and outright mistakes remains my own, and I would love to hear about any issues you find.

Also new in this drop is a full chapter talking about how to query RavenDB and dive into the new RQL language. There is still a lot to cover about indexes, and this chapter hasn’t been edited yet, but I think that this should give a good insight into how we are actually doing things and what you can do with the new query language.

In addition to that, we are ramping up documentation work as we start closing things down to the actual final release. We are currently aiming that at the end of the year, so it is right around the corner. I also would like to remind people that we are currently giving 30% discount for purchase of RavenDB licenses, for the duration of the Release Candidate. This offer will go away after the RTM release.

Another source of confusion seems to be the community license. I wanted to clarify that you can absolutely use the community license for production usage, including using features such as high availability and running in a cluster.

So grab a license, or just grab the bits and run with them. But most importantly, grab the book (https://github.com/ravendb/book/releases) and let me know what you think.

FUTURE POSTS

  1. Queries++ in RavenDB: Gimme more like this - 3 hours from now
  2. Setting unrealistic goals, then exceeding them - about one day from now
  3. Queries++ in RavenDB: I suggest you can do better - 2 days from now
  4. The married couple component design pattern - 3 days from now
  5. Queries++ in RavenDB: Spatial searches - 4 days from now

And 2 more posts are pending...

There are posts all the way to Dec 19, 2017

RECENT SERIES

  1. PR Review (9):
    08 Nov 2017 - Encapsulation stops at the assembly boundary
  2. Queries++ in RavenDB (4):
    07 Dec 2017 - Facets of information
  3. Production postmortem (21):
    06 Dec 2017 - data corruption, a view from INSIDE the sausage
  4. API Design (9):
    04 Dec 2017 - The lack of a method was intentional forethought
  5. The best features are the ones you never knew were there (5):
    27 Nov 2017 - You can’t do everything
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats