Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,516 | Comments: 47,933

filter by tags archive

RavenDB 4.0The Server Dashboard

time to read 1 min | 184 words

One of the things that we have been saving is the hooking together of all the work we have ben doing to expose how RavenDB works into the operations dashboard. This has just landed in the nightly and can give you a lot of insight into exactly what is going on inside your server.

You can see some of the screenshots below. The idea is that in addition to exposing all of these metrics over dedicated endpoints and SNMP, we will also save users the trouble of setting up monitoring and just show them what is going on directly.

Operators can just head to this page and see what is going on, and it is meant to be put as a background for users to observe this during routine operations.

image

image

image

RavenDB 4.0Interlocked distributed operations

time to read 3 min | 440 words

imageWe couldn’t make unique constraints work in RavenDB 4.0 in a way that made sense for distributed operations, there were just too many hurdles at that level of abstractions. The problem, in essence, boils down to having to do an atomic operation in a distributed environment.  When we need to do this in an multi threaded environment, we can rely on interlocked operations to help us. In the same manner, RavenDB offers the notion of interlocked distributed operations.

Let us take a look at how this looks like, shall we?

 

The output of this code would be:

Success: True, Val: users/1, Index: 13

In other words, we were able to atomically set the value of “ayende” to “users/1”. At the most basic level, this gives us the ability to create unique constraints, because we are able to reserve values for particular documents, and it gives us a much more explicit manner in which to do so. If a user wants to change their username, we first try to grab the new name, change the username and then release the old username. And we have the ability to recover if there are errors midway.

This design is modeled after the Interlocked operations, and is meant to be used in the same manner. You submit such an operation to the cluster, which will run this through the Raft state machine. This means that you can rely on RavenDB’s own distributed state machine to get reliable distributed operations, including the full consistency that this provides. The key arguments here is that the name of the value and the index used. If the index you provide match the index on the state machine, we’ll replace the value in an atomic fashion. Otherwise, the operation will fail and you’ll get the current value and index so you can decide if you want to try again.

The example with the unique constraints is just the tip of the iceberg. You can also use this for your own use for things like distributed locking by registering an owner for a particular lock key and ensuring that everyone who needs the lock will race to acquire it. Note that the value that we use here can be anything, including complex objects. In other words, you can do things like set a lock descriptor that would include timeout information, owner, etc.

The interface here is pretty small, in addition to PutCompareExchangeValueOperation there is also just GetCompareExchangeValueOperation, but it is enough for you to be able to lean on RavenDB’s distributed state machine in your own application.

RavenDB 4.0Node.JS client is now in beta

time to read 1 min | 164 words

imageI’m happy to announce that the RavenDB node.js client is now publicly available in beta. Following our Python client (and obviously the .NET one), this is the newest client for RavenDB on the block, with additional clients for the JVM, Go and Ruby quickly reaching critical stage.

Here is some code using it (I’m using async/await here, but the RavenDB node.js client supports any Node 6.0 or higher):

image

And here is how you do some basic CRUD:

image

A full sample app can be found here:

You can just get the code and then run:

npm run serve

And you’ll get a running server running against the public live instance.

RavenDB 4.0Support options

time to read 2 min | 324 words

imageRavenDB 4.0 is going to have a completely free community edition that you could use to run production systems. We do this with the expectation that users will go with the community edition and either will be happy there or upgrade at some point to the commercial editions.

As part of the restructuring we are doing, we intend to also significantly simplify the support model. Our current support model is per RavenDB instance with professional support costing 2,000$ per instance and production (24/7) support costing 6,000$. We got a lot of feedback on this being complex to work with. In particular, the per instance cost meant that operations would need to talk to us during redeployments in order to maintain coverage of all their RavenDB instances.

As part of the Great Simplification we do in 4.0 we also want to tackle the issue of support. As a result, with the rollout of the RavenDB 4.0 RC we are going to move to flat support costs.

  • Professional Support will cost 15% of the license cost and give you access to our support engineers with a guaranteed next business day response time.
  • Production Support will cost 30% of the license cost and give you access to the core team members with 24/7 availability.

This is a significant reduction in price, because we are trying to encourage more people to get support and our previous approach was unbalanced.

The community support will continue to be offered, obviously, but we have no SLA around issues raised there.

The commercial support options will only be available for the Professional and Enterprise editions.

Here is how the costs change between RavenDB 3.x and RavenDB 4.5 for production support:

RavenDB 3.x RavenDB 4.0 Savings
Standard +
Production
Support

6,698$

5,843$

15% reduction

Enterprise 4 Cores +

Production Support

9,152$6,864$33% reduction

RavenDB 4.0Maintaining transaction boundary integrity in a distributed cluster

time to read 3 min | 463 words

Transactions are important for a database, even if it feels strange to talk about it. Sometimes it feels like taking out this ad:

image

We pretty much treat RavenDB’s transactional nature as a baseline, same as the safe assumption that any employee we hire will have a pulse. (Sorry, we discriminate against Zombies and Vampires because they create hostile work environment, see here for details).

Back to transactions, and why I’m brining up a basic requirement like that. Consider the case when you need to pay someone. That operation is compose of two distinct operations. First the bank debit your account and then the bank credit the other account. You generally want these to happen as a transactional unit, either both of them happened, or none of them did. In practice, that isn’t how banks work at all, but that is the simplest way to explain transactions, so we’ll go with that.

With RavenDB, that is how it always worked. You can save multiple documents in a single transaction, and we guarantee that we are either able to save all of them, or none. There is a problem, however, when we start talking about distributed clusters. When you save data to RavenDB, that goes into a single node, and then it is replicated from there. In RavenDB 4.0 we are now also guaranteeing that replicating the data between nodes will also respect the transaction boundary, so overall in the cluster you can now rely that we’ll never break apart transactions.

Let us consider the following transfer:

  • Deduct 5$ from accounts/1234
  • Credit  5$ to accounts/4321

These changes will also be replicated as a single unit, so the total amount of money in the system remains the same.

But a more complex set of operation can be:

  • Deduct 5$ from accounts/1234
  • Credit  5$ to accounts/4321
  • Deduct 5$ from accounts/1234
  • Credit  5$ to accounts/5678

In this case, we have a document that is involved in multiple transactions. When we need to replicate to another node, we’ll replicate the current state and we do that in batches. So it is possible that we’ll replicate just:

  • Credit  5$ to accounts/4321

We don’t replicate accounts/1234 yet, because it has changed in a later transaction. That means that in one server, we suddenly have magically more money. While in general that would be a good thing, I’m told that certain parties aren’t in favor of such things, so we have another feature that interact with this. You can enable document revisions, in which case even if you documents are modified multiple times with different transactions, we’ll always send the transactions across the wire as they were saved.

This gives you transaction boundaries on a distributed database, and allow you to reason about your data in a saner fashion.


RavenDB 4.0Raven Query Language

time to read 3 min | 592 words

The last big feature for RavenDB 4.0 has landed, and it is a big one. You can see the details on the PR that implemented this feature below, but you probably care a lot more about what it is.

RavenDB uses Lucene as the underlying index technology, and until now we simply exposed (slightly modified) Lucene syntax to our clients. That was easy to do and quite effective, but it also meant that we were limited to somewhat arcane query language and what it could do.

In RavenDB 4.0, we now have a full query language, and you can see how this looks like below:

image

This will be produce the results that you expect, giving you all the companies residing in London in the database.

The rest of the system behaves just the same, this query is going to hit the query optimizer, and index will be created if one does not already exists, etc. It is just that our query language is both much nicer to look at and allow us to work with it in a much more structured manner (and yes, that is a pun).

We also support aggregation:

image

Which gives:

image

This is automatically creating a map/reduce index and does all the work for you. We also have support for querying on indexes directly via:

image

If you are familiar with how we used to have to do range queries, you can see how big an improvement this is. This is actually a pretty significant feature, because you can define a static index to do whatever you want with the data, and then query on top of that.

You can also do the usual full text operations directly in the query language:

image

We decide to go with the method abstraction for most operations, because it allows us a lot of freedom in the syntax and give very readable queries.

Here is an example of us trying a more complex query. In this one, I want to find companies in London, the UK or France. But instead of just wanting to find them in that particular order, I want to get them with ranking.

image

I really want a company in London, so that should sort first, and then UK based companies and finally France companies. You can see the results of the query below. This query also show have we can do projections, in a much nicer way.

image

The feature just landed in the main branch and we are now working through all of the rough edges, but it is very exciting, since it give you a natural way to query RavenDB without losing any of the power.

I mentioned that this was a big change, right?

image

And that is just for the C# work, we still have to update the other clients.

RavenDB 4.0The admin’s backdoor is piping hot

time to read 5 min | 805 words

image

We take security very seriously. With the move to X509 certificates only for authentication (on all RavenDB editions) I feel that we have a really good story around securing RavenDB and controlling access to it.

Almost. One of the more annoying things about security is that you also need to consider the hard cases, such as the administrators messing up badly. As in, losing the credentials that allows you to administrator RavenDB. This can happen because the database has just run without issue for so long that no one can remember where the keys are. That isn’t supposed to happen, but RavenDB has been in production usage for close to a decade now, which mean that we have seen our fair share of mess ups (both our own and by customers).

In some cases, we have had to help a customer manage a third system handover between different hosting providers, which felt very much half like forensic and half like hacking. In short, when we design a system now, we also consider the fact that as secure as we want the system to be, there must be a way for an authorized person to get in.

If this made you cringe, you are in good company. I both love and hate this feature. I love it because it is going to be very useful, I hate it because it was a headache to figure it right. But I’m jumping ahead of myself. What is this backdoor that I’m talking about?

Properly configured RavenDB will require a client certificate (that was registered in the cluster) to access the server. However, in addition to listening over HTTPS, RavenDB will also listen for commands on standard input. An admin can use the standard input / output as a way to talk with RavenDB without requiring any authentication. Basically, we expose a mini shell that you can use to enter commands and inspect and change our state.

Here is how it looks like when running in console mode:

image

From a security point of view, if a user is able to access my standard input, that usually means that they are the one that have run this process or are able to so. RavenDB obviously won’t have any setuid bits turned on, so no need to worry about a user tricking us to do something that the user don’t have permissions to do.

So using the console is a really nice way for us to offer the administrator an escape hatch to start messing with the internals of RavenDB in interesting way. However, that only work if you are running RavenDB in interactive mode. What about when running as a service or daemon? They don’t have a standard input that is available to the admin. In fact, in most production deployments, you won’t have an easy time at all trying to connect to the console.

So that option is out, sadly. Or is it?

The nice thing about operating systems is that we can lean on them. In this case, we expose the exact same console that we have for stdin / stdout using Named Pipes (actually, Unix Sockets in Linux / Mac, but pretty much the same idea). The idea is that those are both methods for inter process communication that are local to the machine and can be secured by the operating system directly. In this case, we make sure that the pipe is only accessible to the RavenDB user (and to root / Administrator, obviously). That means that an admin can log into the box, run a single command and land in the RavenDB admin shell where he can manage the server. For example, by registering a new certificate in the server Smile.

Because only the user running the RavenDB process or an administrator / root can access the pipe (ensured by setting the proper ACL on the pipe during creation) we know that there isn’t any security risk here. An admin can already override any security in the box, and the permissions are always on the user level, not the process level, so if you are running as the same user as the RavenDB process you can already do anything that RavenDB can do.

After we ensured that our security isn’t harmed by this option, we can relax knowing that we have an easy (and safe) way for the administrator to manage the server in an emergency.

In fact, the most obvious usage of this feature is during initial cluster setup, when you don’t have anything yet. This allow you to enter the system as a trusted party and do the initial configuration.

RavenDB 4.0Securing the keys to the kingdom

time to read 7 min | 1238 words

imageA major design goal for RavenDB is that it would be easy and convenient to user. A major constraint is that it must be secured. As you can imagine, those two are quite often work against one another. Security is often anything but easy to use, and it is rarely convenient. 

Previously, we have used Windows Authentication and OAuth to secure access to RavenDB. That works and has been deployed in the wild for quite some time. It is also a major pain whenever there is an issue. If the connection to the domain controller drops, we might have authentication delays of many seconds, and trying to debug Active Directory issues in production deployments can be… a bit of a pain, in the same way that an audit by the IRS that starts with SWAT team bashing down your door is mildly annoying.  OAuth, on the other hand, is much better, since it is under our control, and we can figure out exactly what is going on with it if need be.

Since RavenDB 4.0 is running on Windows, Linux & Mac, we decided to drop the Windows Authentication support and just use OAuth. The problem is that if we choose to support HTTP, we have to rely on extremely complex protocols that attempt to secure authentication using plain text, but don’t usually deliver good results and are typically a pain to debug and support. Or, we can use HTTPS and just let SSL/TLS to handle it all for us. A good example of the difference can be seen in OAuth 1.0 vs OAuth 2.0.

When we built RavenDB 1.0, roughly around 2009, the operating environment was quite different. In 2017, not using HTTPS is pretty much a sin into itself. As we started security modeling for RavenDB 4.0, it became obvious that we couldn’t really support any security on top of HTTP without effectively having to implement most of the properties of HTTPS ourselves. I’m many things, but I’m not a security expert, not by a long shot. Given the chance to implement my own security protocol, I would gladly do that, for a toy project or a weekend hackfest. But there is no way I would trust my own security in production against serious attacks. That pretty much led us to the realization that we have to require HTTPS for anything that require security.

That includes running inside the organization, exposed to the public internet, running inside the cloud or in a shared datacenter, etc. Pretty much, unless you have HTTPS, there is no real point in talking about security. Given that, it meant that we could shift our baseline approach to security. If we are always going to require HTTPS for security, it means that we are operating in an environment that is much nicer for us to apply security.

Now, you can choose to run HTTP only, and avoid the need for certificate management, etc. However, at that point, you aren’t running a secure system, or you are already running it in a trusted and secured environment. In that case, we want to be clear that there isn’t any point to try to apply security policy (such as who can access what). Any network sniffer can figure out the access tokens and pretend to be whomever they want, if you are using HTTP.

With HTTPS required, we now move to the realm of having the admin take care of the certificates, securing them, renewal, etc. That is the part where it isn’t as easy or convenient as we could wish for. However, once we had that as a baseline, it opens an interesting path for security. Instead of relying on our own solution, we can use the builtin one and use x509 certificates from the client for authentication. This has the advantage that it is widely supported, standardized and secured. It is a bit less convenient then just a password, but the advantage is that any security system already in place know how to deal with, store, authorize and manage access to certificates.

The idea is that you can go to RavenDB and either register or generate a x509 certificate. To that certificate an administrator can assign permissions (such as what dbs it is allowed to access). From that point on, a client (RavenDB, browser, curl, etc) can connect to RavenDB and just issue REST requests. There is no need to do anything else for the system to work. Contrast that with how you would typically have to deal authentication using OAuth, by sending the token, keeping it fresh manually, etc.

Using x509 also has the distinct advantage that it is widely trusted. We intend to provide this level of security to all editions of RavenDB (so the Community Edition will also be able to use it).

A nice accidental feature of this decision is that we are going to be able to apply authentication at the connection level, and connection pooling means that we are likely going to have connections live for a long time. That means that we only need to pay the authentication cost once, instead of per request, with OAuth.

To simplify matters, we’ll likely just use the client certificates for authenticating the client, so we’ll not care if they are from a trusted root, etc. We’ll just require that the admin register the valid certificate with the cluster so they will be recognized. If you need to stop using a certificate, you can delete its registration or generate a new certificate to take its place. On the client side, it means that the DocumentStore will expose a X509Certificate property that you can set (or the equivalent in other clients). That means that you can use your own policies on the client to determine how to store the certificate.

On the server side, by the way, we’ll expose an extension point that will allow you to retrieve the certificate using your own policies. For example, if you are using Azure Key Vault or Hashicorp Vault or even your own HSM. This is done by invoking a process you specify, so you can write your own scripts / mini programs and apply whatever logic you need. This creates a clean separation between RavenDB and the secret store in use.

Authentication between servers is also done using SSL and certificates. We expect that we’ll commonly have all the servers running the same wildcard certificate, in which case they will obviously trust each other. Alternatively, you can also specify additional certificates that will be treated as servers. This is useful for when you are running with separate certificate for each server, but it is also a critical part of certificate rotation. When your certificate is about to expire, the admin will register the new certificate as trusted, and then start replacing the certificates of each of the nodes in turn. This allow us to run with both old and new certificates concurrently during this process.

We considered relying on some properties of the certificate itself, but it seemed like an error prune process. It is better to have the admin explicitly state, both for clients and server certificates which one we should actually trust, and at what level.

I would really appreciate any commentary you have about this feature, both in terms of ease of use, acceptability and obviously its security.

RavenDB 4.0Unbounded results sets

time to read 3 min | 503 words

Unbounded result sets are a pet peeve of mine. I have seen them destroy application performance more then once. With RavenDB, I decided to cut that problem at the knees and placed a hard limit on the number of results that you can get from the server. Unless you configured it differently, you couldn’t get more than 1,024 results per query. I was very happy with this decisions, and there have been numerous cases where this has been able to save an application from serious issues.

Unfortunately, users hated it. Even though it was configurable, and even though you could effectively turn it off, just the fact that it was there was enough to make people angry.

Don’t get me wrong, I absolutely understand some of the issues raised. In particular, if the data goes over a certain size we suddenly show wrong results or error, leaving the app in a “we need to fix this NOW”. It is an easy mistake to make. In fact, in this blog, I noticed a few months back that I couldn’t get entries from 2014 to show up in the archive. The underlying reason was exactly that, I’m getting the number of items per month, and I’ve been blogging for more than 128 months, so the data got truncated.

In RavenDB 4.0 we removed the limit. If you don’t specify a limit in a query, you’ll get exactly how many results there are in the database. You can ask RavenDB to raise an error if you didn’t specify a limit clause, which is a way for you to verify that you won’t run into this issue in production, but it is off by default and will probably better match the new user expectations.

The underlying issue of loading too many results is still there, of course. And we still want to do something about it. What we did was raise alerts.

I have made a query on a large set (160,000 results, about 400 MB in all) and the following popped up in the RavenDB Studio:

image

This tells the admin that it have some information that it needs to look at. This is intentionally non obtrusive.

When you click on the notifications, you’ll get the following message.

image

And if you’ll click on the details, you’ll see the actual details of the operations that triggered this warning.

image

I actually created an issue so we’ll supply you with more information (such as the index, the query, duration and the total size that it generated over the network).

I think that this gives the admin enough information to act upon, but will not cause hardship to the application. This make it something that we Should Fix instead Get the OnCall Guy.

RavenDB 4.0The etag simplification

time to read 2 min | 357 words

A seemingly small change in RavenDB 4.0 is the way we implement the etag. In RavenDB 3.x and previous we used a 128 bits number, that was divided into 8 bits of type, 56 bits of restarts counter and 64 bits of changes within the current restart period. Visually, this looks like a GUID:  01000000-0000-0018-0000-000000000002.

The advantage of this format is that it is always increasing, very cheap to handle and requires very little persistent data. The disadvantage is that it is very big, not very human readable and the fact that the number of changes reset on every restart means that you can’t make meaningful deduction about relative sizes between any two etags.

In RavenDB 4.0 we shifted to use a single 64 bits number for all etag calculations. That means that we can just expose a long (no need for the Etag class) which is more natural for most usages. This decision also means that we need to store a lot less information, and etags are one of those things that we go over a lot.  A really nice side affect which was totally planned is that we can now take two etags and subtract them and get a pretty good idea bout the range that needs to be traversed.

Another important decision is that everything uses the same etag range. So documents, revisions, attachments and everything share the same etag, which make it very simple to scan through and find the relevant item just based on a single number. This make it very easy to implement replication, for example, because the wire protocol and persistence format remain the same.

I haven’t thought to write about this, seemed like too small a topic for post, but there was some interest about it in the mailing list, and enumerating all the reasons, it suddenly seems like it isn’t such a small topic.

Update: I forgot to mention, a really important factor of this decision is that we can do do this:

image

So we can give detailed information and expected timeframes easily.

FUTURE POSTS

  1. NHibernate Profiler 5.0 Alpha has been released - 12 hours from now
  2. The best features are the ones you never knew were there: You can’t do everything - 3 days from now
  3. You are doing it REALLY wrong, the shortest code review ever - 4 days from now
  4. Carefully performing invalid operations to get the wrong error and the right result - 5 days from now
  5. If you have a finalizer, watch your ctor - 6 days from now

And 7 more posts are pending...

There are posts all the way to Dec 11, 2017

RECENT SERIES

  1. PR Review (9):
    08 Nov 2017 - Encapsulation stops at the assembly boundary
  2. API Design (9):
    27 Jul 2016 - robust error handling and recovery
  3. Production postmortem (21):
    07 Aug 2017 - 30% boost with a single line change
  4. The best features are the ones you never knew were there (5):
    21 Nov 2017 - Unsecured SSL/TLS
  5. RavenDB Setup (2):
    23 Nov 2017 - How the automatic setup works
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats