Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,436 | Comments: 47,611

filter by tags archive

Breaking the language barrier

time to read 5 min | 916 words

In RavenDB 4.0, we decided to finally bite the bullet and write our own query language. That led to a lot of really complex decisions that we had to make. I already posted about the language and you saw some first drafts. RQL is meant to be instantly familiar to anyone who used SQL before.

At the same time, this led to issues. In particular, the major issue is that SQL as a language is meant to handle rectangular tables, and RavenDB isn’t storing data in tables.  For that matter, we also want to give our users the ability to express themselves fully in the query language, that means support for complex queries and complex projections.

For a while, we explored the option of supporting nested selects as the way to express these semantics, but that was pretty horrible, both in terms of the complexity of the implementation and in terms of the complexity of the language. Instead, we decided that take only the good parts out of SQL Smile.

What do I mean by that? Well, here is a pretty simple query:

image

And here is how we can ask for specific fields:

image

You’ll note that we moved the select clause to the end of the query. The idea being that the syntax will hint to the user about the order of operations in the pipeline.

Next we add some filtering, giving us:

image

This isn’t really interesting except for the part where implementing this was a lot of fun. Things become both more interesting and more obvious when we look at a full query, with almost everything there:

image

And again, this is pretty boring, because except for the clauses location, you might as well write SQL. So let us see what happens when we start mixing some of RavenDB’s own operations.

Here is how we can traverse the object graph on the database side:

image

This is an interesting example, because you can see that we have traversed the path of the relation and are able to fetch not just the data from the documents we are looking for, but also the related documents.

It is important to note that this happens after the where, so you can’t filter on that (but you can plug this in the index, but that is a story for another time). I like this syntax, because it is very clear about what is going on, but at the same time, it is also very limited. If I want anything that isn’t rectangular, I need to jump through hops. Instead, let us try something else…

image

The idea is that we can use a JavaScript object literal in place of the select. Instead of fighting with the language and have to fight it, we have a very natural way to express projections from RavenDB.

The cool part is that you can apply logic as well, so things like string concatenation or logic in the select clause are absolutely permitted. However, take a look at the example, we have code duplication here, in the formatting for customer and contact names. We can do something about it, though:

image


The idea is that we also have a way to define functions to hold common logic and handle the more complex details if we need to. In this manner, instead of having to define and deploy transformers, you can define that directly in your query.

Naturally, if you want to calculate the Mandelbrot set inside the query, that is… suspect, but the idea is that having JavaScript applied on the results of the query give us a lot more freedom and easy of use.

The actual client side in C# is going to remain unchanged and was already updated to talk to the new backend, but in our dynamic language clients, I think we’ll work to expose this more, since it is a much better fit in such environments.

Finally, here is the full query, with everything that we can do:

image

Don’t focus too much the actual content of the queries, they are mostly there to show off the syntax. The last query now has the notion of include directly in the query. As an aside, you can now use load and include to handle tertiary includes. I didn’t actually set out to build them, but they came naturally from the syntax, and they are probably going to stay.

I have a lot more to say about this, but I’m so excited about this feature that I just got to get it out there. And yes, as you can see, you can declare several functions, and you can also call them from one another so there is a lot of power at the table right now.

RavenDB 4.0Maintaining transaction boundary integrity in a distributed cluster

time to read 3 min | 463 words

Transactions are important for a database, even if it feels strange to talk about it. Sometimes it feels like taking out this ad:

image

We pretty much treat RavenDB’s transactional nature as a baseline, same as the safe assumption that any employee we hire will have a pulse. (Sorry, we discriminate against Zombies and Vampires because they create hostile work environment, see here for details).

Back to transactions, and why I’m brining up a basic requirement like that. Consider the case when you need to pay someone. That operation is compose of two distinct operations. First the bank debit your account and then the bank credit the other account. You generally want these to happen as a transactional unit, either both of them happened, or none of them did. In practice, that isn’t how banks work at all, but that is the simplest way to explain transactions, so we’ll go with that.

With RavenDB, that is how it always worked. You can save multiple documents in a single transaction, and we guarantee that we are either able to save all of them, or none. There is a problem, however, when we start talking about distributed clusters. When you save data to RavenDB, that goes into a single node, and then it is replicated from there. In RavenDB 4.0 we are now also guaranteeing that replicating the data between nodes will also respect the transaction boundary, so overall in the cluster you can now rely that we’ll never break apart transactions.

Let us consider the following transfer:

  • Deduct 5$ from accounts/1234
  • Credit  5$ to accounts/4321

These changes will also be replicated as a single unit, so the total amount of money in the system remains the same.

But a more complex set of operation can be:

  • Deduct 5$ from accounts/1234
  • Credit  5$ to accounts/4321
  • Deduct 5$ from accounts/1234
  • Credit  5$ to accounts/5678

In this case, we have a document that is involved in multiple transactions. When we need to replicate to another node, we’ll replicate the current state and we do that in batches. So it is possible that we’ll replicate just:

  • Credit  5$ to accounts/4321

We don’t replicate accounts/1234 yet, because it has changed in a later transaction. That means that in one server, we suddenly have magically more money. While in general that would be a good thing, I’m told that certain parties aren’t in favor of such things, so we have another feature that interact with this. You can enable document revisions, in which case even if you documents are modified multiple times with different transactions, we’ll always send the transactions across the wire as they were saved.

This gives you transaction boundaries on a distributed database, and allow you to reason about your data in a saner fashion.


Production postmortem30% boost with a single line change

time to read 7 min | 1231 words

johnny-automatic-lame-fox-300pxThis isn’t actually about a production system, per se. This is about a performance regression we noticed in a scenario for 4.0. We recently started doing structured performance testing as part of our continuous integration system.

There goes the sad trombone sound, because what that told us is that at some point, we had a major performance regression. To give you that in numbers, we used to be around 50,000 writes per second (transactional, fully ACID, hit the disk, data is safe) and we dropped to around 35,000 writes per second. Fluctuations in performance tests are always annoying, but that big a drop couldn’t be explained by just random load pattern causing some variance in the results.

This was consistent and major difference between the previous record. Now, to explain things, the last firm numbers we had about this performance test was from Dec 2016. Part of the release process for 4.0 calls for a structured and repeatable performance testing, and this is exactly the kind of thing that it is meant to catch. So I’m happy we caught it, not so happy that we had to go through about 7 months of changes to figure out what exactly drove our performance down.

The really annoying thing is that we put a lot of effort into improving performance, and we couldn’t figure out what went wrong there. Luckily, we could scan the history and track our performance over time. I’m extremely over simplifying here, of course. In practice this meant running bisect on all the changes in the past 7 – 8  months and running benchmarks on each point. And performance is usually not something as cut and dried as a pass / fail test.

Eventually we found the commit, and it was small, self contained and had a huge impact on our performance. The code in question?

ThreadPool.SetMinThreads(MinThreads, MinThreads);

This change was part of ironically, a performance run that focused on increasing the responsiveness of the server under a specific load. More specifically, it handled the case where under a harsh load we would run into thread starvation. The problem with this load is that this specific load can only ever happen in our test suite. At some point we had run up to 80 parallel tests, each of which might spawn multiple servers. The problem is that we run them all a single process, so we run into a lot of sharing between the different tests.

That was actually quite intentional, because it allowed us to see how the code behaves in a bad environment. It exposed a lot of subtle issues, but eventually it got very tiring to have tests fail because there just weren’t enough threads to run them fast enough. So we set this value to a high enough number to not have to worry about this.

The problem is that we set it in the server, not in the tests. And that led to the performance issue. But the question is why would this setting cause such a big drop? Well, the actual difference in performance went from 20 μs to taking 28.5 μs, which isn’t that much when you think about it. The reason why this happened is a bit complex, and require understanding several different aspect of the scenario under test.

In this particular performance scenario, we are testing what happens when a lot of clients are writing to the database all at once. Basically, this is a 100% small writes scenario. Out aim here is to get sustained throughput and latency throughout the scenario. Given that we ensure that every write to RavenDB actually hit the disk and is transactional, that means that every single request needs to physically hit the disk. If we tried doing that for each individual request, that would lead to about 100 requests a second on a good disk, if that.

In order to avoid the cost of having to go to the physical disk, we do a lot of batching. Basically, we read the request from the network, do some basic processing and then send it to a queue where it will hit the disk along with all the other concurrent requests. We make heavy use of async to allow us to consume as little system resources as possible. This means that at any given point, we have a lot of requests that are in flight, waiting for the disk to acknowledge the write. This is pretty much the perfect system for high throughput server, because we don’t have to hold up a thread.

So why would increasing the amount of minimum threads hurt us so much?

To answer that we need to understand what it is exactly that SetMinThreads control. The .NET thread pool is a lazy one, even if you told it that the minimum number of threads is 1000, it will do nothing with this value. Instead, it will just accept work and distribute it among its threads. So far, so good. The problem starts when we have more work at hand then threads to process it. At this point, the thread pool need to make a decision, it can either wait for one of the existing threads to become available of it can create a new thread to  handle the new load.

This is when SetMinThreads come into play. Up until the point the thread pool created enough threads to exceed the min count, whenever there is more work in the pool then there are threads, a new thread will be created. If the minimum number of threads has been reached, the thread pool will wait a bit before adding new threads. This allow the existing threads time to finish whatever they were doing and take work from the thread pool. The idea is that given a fairly constant workload, the thread pool will eventually adjust to the right amount of threads to ensure that there isn’t any work that is sitting in the queue for too long.

With the SetMinThreads change, however, we increased the minimum number of threads significantly, and that caused the thread pool to create a lot more threads then it would otherwise create. Because our workload is perfect for the kind of tasks that the thread pool is meant to handle (short, do something and schedule more work then yield) it is actually very rare that the thread pool would need to create an additional thread, since a thread will be available soon enough.

This is not an accident, we very careful set out to build it in this fashion, of course.

The problem is that with the high number of minimum threads, it was very easy to get the thread pool into a situation where it had more work then threads. Even though we would have a free thread in just a moment, the thread pool would create a new thread at this point (because we told it to do so).

That lead to a lot of threads being created, and that lead to a lot of context switches and thread contentions as the threads are fighting over processing the incoming requests. In effect, that would take a significant amount of the actual compute time that we have for processing the request. By removing the SetMinThreads calls, we reduced the number of threads, and were able to get much higher performance.

RavenDB 4.0Raven Query Language

time to read 3 min | 592 words

The last big feature for RavenDB 4.0 has landed, and it is a big one. You can see the details on the PR that implemented this feature below, but you probably care a lot more about what it is.

RavenDB uses Lucene as the underlying index technology, and until now we simply exposed (slightly modified) Lucene syntax to our clients. That was easy to do and quite effective, but it also meant that we were limited to somewhat arcane query language and what it could do.

In RavenDB 4.0, we now have a full query language, and you can see how this looks like below:

image

This will be produce the results that you expect, giving you all the companies residing in London in the database.

The rest of the system behaves just the same, this query is going to hit the query optimizer, and index will be created if one does not already exists, etc. It is just that our query language is both much nicer to look at and allow us to work with it in a much more structured manner (and yes, that is a pun).

We also support aggregation:

image

Which gives:

image

This is automatically creating a map/reduce index and does all the work for you. We also have support for querying on indexes directly via:

image

If you are familiar with how we used to have to do range queries, you can see how big an improvement this is. This is actually a pretty significant feature, because you can define a static index to do whatever you want with the data, and then query on top of that.

You can also do the usual full text operations directly in the query language:

image

We decide to go with the method abstraction for most operations, because it allows us a lot of freedom in the syntax and give very readable queries.

Here is an example of us trying a more complex query. In this one, I want to find companies in London, the UK or France. But instead of just wanting to find them in that particular order, I want to get them with ranking.

image

I really want a company in London, so that should sort first, and then UK based companies and finally France companies. You can see the results of the query below. This query also show have we can do projections, in a much nicer way.

image

The feature just landed in the main branch and we are now working through all of the rough edges, but it is very exciting, since it give you a natural way to query RavenDB without losing any of the power.

I mentioned that this was a big change, right?

image

And that is just for the C# work, we still have to update the other clients.

Error handling belongs at Layer 7 (policy)

time to read 3 min | 411 words

imageWhen you setup a RavenDB server securely, it doesn’t require a valid X509 certificate to be able to connect to it.

You might want to read that statement a few times to realize what I’m saying. You don’t need a certificate to connect to a RavenDB service. But on the other hand, we use client certificate to authenticate connections. So which is it?

The problem is with the specific OSI layer that we are talking about. When using SSL / TLS you can require a client certificate on connection. That ensures that only connections with a client certificate can be processed by the server. This is almost what we want, but it has the sad effect of having really poor error reporting capabilities.

For example, let us assume that you are trying to connect to a secured RavenDB server without a certificate. If we just require client certificate, then the error would be raised a the SSL layer, giving you errors that look something like:

  • The connection has been closed
  • Could not establish trust relationship for the SSL/TLS

This can also happen if there is a firewall in the middle, if enough packets were dropped, if there is a solar storm or just because it is Monday.

Or, it can be because you don’t have a certificate, your certificate expired, it is not known or you are trying to access a resource you are not authorized to.

In all of those cases, the error you’ll receive if you require & validate the certificate at OSI Layer 6 is going, to put it plainly, suck. Just imagine trying to debug something like that in a protocol that by design is intended to be hard / impossible to eavesdrop to. Instead, we can allow access to the server without a certificate and enforce the required certificate at a higher level.

That means that when you connect to RavenDB and you don’t have a certificate, we’ll accept the connect and then either give you an error (if you are an API) that you can reason about (because it will tell you what is wrong) or you’ll be redirected into a nice error page (if you are using the browser). Alternatively, if you have a certificate and it is not valid for any number of reasons, RavenDB will tell you all about it.

That reduce the amount of hair loss in the case of errors significantly.

Inside RavenDB 4.0Chapter 6 is done

time to read 1 min | 73 words

I’ve just completed writing chapter 6 (distributed RavenDB) and pushed a preview up. This put the page count at over 200 pages so far, with another two thirds or so left.

This chapter was really hard to write, and I would really appreciate any feedback that you have on the text and on the distributed nature of RavenDB 4.0 in general. It is very similar and a different beast entirely then 3.x.

RavenDB 4.0The admin’s backdoor is piping hot

time to read 5 min | 805 words

image

We take security very seriously. With the move to X509 certificates only for authentication (on all RavenDB editions) I feel that we have a really good story around securing RavenDB and controlling access to it.

Almost. One of the more annoying things about security is that you also need to consider the hard cases, such as the administrators messing up badly. As in, losing the credentials that allows you to administrator RavenDB. This can happen because the database has just run without issue for so long that no one can remember where the keys are. That isn’t supposed to happen, but RavenDB has been in production usage for close to a decade now, which mean that we have seen our fair share of mess ups (both our own and by customers).

In some cases, we have had to help a customer manage a third system handover between different hosting providers, which felt very much half like forensic and half like hacking. In short, when we design a system now, we also consider the fact that as secure as we want the system to be, there must be a way for an authorized person to get in.

If this made you cringe, you are in good company. I both love and hate this feature. I love it because it is going to be very useful, I hate it because it was a headache to figure it right. But I’m jumping ahead of myself. What is this backdoor that I’m talking about?

Properly configured RavenDB will require a client certificate (that was registered in the cluster) to access the server. However, in addition to listening over HTTPS, RavenDB will also listen for commands on standard input. An admin can use the standard input / output as a way to talk with RavenDB without requiring any authentication. Basically, we expose a mini shell that you can use to enter commands and inspect and change our state.

Here is how it looks like when running in console mode:

image

From a security point of view, if a user is able to access my standard input, that usually means that they are the one that have run this process or are able to so. RavenDB obviously won’t have any setuid bits turned on, so no need to worry about a user tricking us to do something that the user don’t have permissions to do.

So using the console is a really nice way for us to offer the administrator an escape hatch to start messing with the internals of RavenDB in interesting way. However, that only work if you are running RavenDB in interactive mode. What about when running as a service or daemon? They don’t have a standard input that is available to the admin. In fact, in most production deployments, you won’t have an easy time at all trying to connect to the console.

So that option is out, sadly. Or is it?

The nice thing about operating systems is that we can lean on them. In this case, we expose the exact same console that we have for stdin / stdout using Named Pipes (actually, Unix Sockets in Linux / Mac, but pretty much the same idea). The idea is that those are both methods for inter process communication that are local to the machine and can be secured by the operating system directly. In this case, we make sure that the pipe is only accessible to the RavenDB user (and to root / Administrator, obviously). That means that an admin can log into the box, run a single command and land in the RavenDB admin shell where he can manage the server. For example, by registering a new certificate in the server Smile.

Because only the user running the RavenDB process or an administrator / root can access the pipe (ensured by setting the proper ACL on the pipe during creation) we know that there isn’t any security risk here. An admin can already override any security in the box, and the permissions are always on the user level, not the process level, so if you are running as the same user as the RavenDB process you can already do anything that RavenDB can do.

After we ensured that our security isn’t harmed by this option, we can relax knowing that we have an easy (and safe) way for the administrator to manage the server in an emergency.

In fact, the most obvious usage of this feature is during initial cluster setup, when you don’t have anything yet. This allow you to enter the system as a trusted party and do the initial configuration.

RavenDB 4.0Securing the keys to the kingdom

time to read 7 min | 1238 words

imageA major design goal for RavenDB is that it would be easy and convenient to user. A major constraint is that it must be secured. As you can imagine, those two are quite often work against one another. Security is often anything but easy to use, and it is rarely convenient. 

Previously, we have used Windows Authentication and OAuth to secure access to RavenDB. That works and has been deployed in the wild for quite some time. It is also a major pain whenever there is an issue. If the connection to the domain controller drops, we might have authentication delays of many seconds, and trying to debug Active Directory issues in production deployments can be… a bit of a pain, in the same way that an audit by the IRS that starts with SWAT team bashing down your door is mildly annoying.  OAuth, on the other hand, is much better, since it is under our control, and we can figure out exactly what is going on with it if need be.

Since RavenDB 4.0 is running on Windows, Linux & Mac, we decided to drop the Windows Authentication support and just use OAuth. The problem is that if we choose to support HTTP, we have to rely on extremely complex protocols that attempt to secure authentication using plain text, but don’t usually deliver good results and are typically a pain to debug and support. Or, we can use HTTPS and just let SSL/TLS to handle it all for us. A good example of the difference can be seen in OAuth 1.0 vs OAuth 2.0.

When we built RavenDB 1.0, roughly around 2009, the operating environment was quite different. In 2017, not using HTTPS is pretty much a sin into itself. As we started security modeling for RavenDB 4.0, it became obvious that we couldn’t really support any security on top of HTTP without effectively having to implement most of the properties of HTTPS ourselves. I’m many things, but I’m not a security expert, not by a long shot. Given the chance to implement my own security protocol, I would gladly do that, for a toy project or a weekend hackfest. But there is no way I would trust my own security in production against serious attacks. That pretty much led us to the realization that we have to require HTTPS for anything that require security.

That includes running inside the organization, exposed to the public internet, running inside the cloud or in a shared datacenter, etc. Pretty much, unless you have HTTPS, there is no real point in talking about security. Given that, it meant that we could shift our baseline approach to security. If we are always going to require HTTPS for security, it means that we are operating in an environment that is much nicer for us to apply security.

Now, you can choose to run HTTP only, and avoid the need for certificate management, etc. However, at that point, you aren’t running a secure system, or you are already running it in a trusted and secured environment. In that case, we want to be clear that there isn’t any point to try to apply security policy (such as who can access what). Any network sniffer can figure out the access tokens and pretend to be whomever they want, if you are using HTTP.

With HTTPS required, we now move to the realm of having the admin take care of the certificates, securing them, renewal, etc. That is the part where it isn’t as easy or convenient as we could wish for. However, once we had that as a baseline, it opens an interesting path for security. Instead of relying on our own solution, we can use the builtin one and use x509 certificates from the client for authentication. This has the advantage that it is widely supported, standardized and secured. It is a bit less convenient then just a password, but the advantage is that any security system already in place know how to deal with, store, authorize and manage access to certificates.

The idea is that you can go to RavenDB and either register or generate a x509 certificate. To that certificate an administrator can assign permissions (such as what dbs it is allowed to access). From that point on, a client (RavenDB, browser, curl, etc) can connect to RavenDB and just issue REST requests. There is no need to do anything else for the system to work. Contrast that with how you would typically have to deal authentication using OAuth, by sending the token, keeping it fresh manually, etc.

Using x509 also has the distinct advantage that it is widely trusted. We intend to provide this level of security to all editions of RavenDB (so the Community Edition will also be able to use it).

A nice accidental feature of this decision is that we are going to be able to apply authentication at the connection level, and connection pooling means that we are likely going to have connections live for a long time. That means that we only need to pay the authentication cost once, instead of per request, with OAuth.

To simplify matters, we’ll likely just use the client certificates for authenticating the client, so we’ll not care if they are from a trusted root, etc. We’ll just require that the admin register the valid certificate with the cluster so they will be recognized. If you need to stop using a certificate, you can delete its registration or generate a new certificate to take its place. On the client side, it means that the DocumentStore will expose a X509Certificate property that you can set (or the equivalent in other clients). That means that you can use your own policies on the client to determine how to store the certificate.

On the server side, by the way, we’ll expose an extension point that will allow you to retrieve the certificate using your own policies. For example, if you are using Azure Key Vault or Hashicorp Vault or even your own HSM. This is done by invoking a process you specify, so you can write your own scripts / mini programs and apply whatever logic you need. This creates a clean separation between RavenDB and the secret store in use.

Authentication between servers is also done using SSL and certificates. We expect that we’ll commonly have all the servers running the same wildcard certificate, in which case they will obviously trust each other. Alternatively, you can also specify additional certificates that will be treated as servers. This is useful for when you are running with separate certificate for each server, but it is also a critical part of certificate rotation. When your certificate is about to expire, the admin will register the new certificate as trusted, and then start replacing the certificates of each of the nodes in turn. This allow us to run with both old and new certificates concurrently during this process.

We considered relying on some properties of the certificate itself, but it seemed like an error prune process. It is better to have the admin explicitly state, both for clients and server certificates which one we should actually trust, and at what level.

I would really appreciate any commentary you have about this feature, both in terms of ease of use, acceptability and obviously its security.

The ghost of the zombie of revisions past

time to read 3 min | 433 words

I talked about difficult naming decisions, and this one was certainly one of the more lively ones.

We bounced between zombies, orphans and ghosts, with a bunch of crazy stuff going in between. At one point it was suggested we’ll just make up a word, but my suggestion to use Welchet was sadly declined by all, including a rather rude comment by the author of this blog about what kind of jokes are appropriate for the workplace.

After we settled the discussion on ghosts, there was another discussion about whatever we should use Inky, Blinky, Pinky and Clyde. I tell you, when we aren’t building distributed databases, the office is a hotbed for nerd references.

And then an idea cam along. Which I really liked, so we talked about this in the morning and I’m showing screenshots at a blog post a bit before midnight. The feature is called the revision bin.

In the UI, you can see it as one of the top level elements.

image

In essence, this is a recycle bin for revisions. RavenDB can be configured to keep revisions of documents as they change, and even keep track of them after they were deleted. However, that presented a problem. If you deleted a document that had revisions, how would you tell that it was there in the first place? Just knowing the document id and looking for that wouldn’t work very well. So we created the revisions bin, whose content looks like this:

image

And from there you can go to:

image

For that matter, if we recreate this document again, you’ll be able to see its entire history, including across deletes.

image

Now admittedly this is a nice looking UI, and the skull on the menu is a nice touch, if a bit morbid. However, why make such a noise about such a feature?

The answer is that the revisions bin isn’t that important, but keeping track of deletes of documents using revisions is quite important, since it allow subscriptions and ETL to handle them in a clean and easy to grok manner. And in order to actually explain that, we needed to be able to show the users what we are talking about.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Optimizing JavaScript and solving the halting problem (2):
    18 Aug 2017 - Part II
  2. RavenDB 4.0 (12):
    14 Aug 2017 - Maintaining transaction boundary integrity in a distributed cluster
  3. Public Service Announcement (2):
    11 Aug 2017 - ConcurrentDictionary.Count is locking
  4. PR Review (4):
    10 Aug 2017 - Errors, errors and more errors
  5. Production postmortem (19):
    07 Aug 2017 - 30% boost with a single line change
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats