Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

, @ Q j

Posts: 6,865 | Comments: 49,208

filter by tags archive
time to read 3 min | 441 words

The upgrade process from RavenDB 3.5 and earlier to RavenDB 4.x is not easy. This is because I made a conscious decision to not have backward compatibility between these versions. I made that decision because we had to be able to make massive changes internally in order to get to the targets that we set to ourselves. I actually discussed that decision in detail in a previous blog post and a talk.

Four years later, I still stand by that decision, but I also regret the spanner that it threw into the works. Migrating RavenDB applications to 4.x from previous versions is harder than it should. In retrospect, we probably should have invested the time in a compatibility layer that would make it easier to migrate.

I wanted to take a moment and talk about RavenDB 5.0, expected in 2020, and our plans for that release. We are going to be doing some minor cleanup of the API. Methods and classes that are marked as [Obsolete] will be removed. These tend to be at the very edge of the explored API and have been marked as such for quite some time. Beyond these change (which you’ll a clear and obvious alternative for), you aren’t going to need to do much at all.

Our goal for converting an application from RavenDB 4.x to 5.x is that the process is for 90% of the projects - Update NuGet packages, compile, you are done. For the 10%, it may mean that you need to make some minor changes. For example, change DisableEntitiesTracking to NoTracking if you are using the low level query API.

We also intend to allow at least the vast majority of operations to just work between 4.x client and 5.x server. In other words, even when you upgrade server versions, you aren’t going to have to upgrade the client version unless you want to use the new features.

There are also additional considerations that we have to take into account:

  • RavenDB now have official clients for: .NET, JVM, Go, Python, Node.JS, C++. As well as a unofficial clients.
  • RavenDB Cloud instances are maintained by us, and will be upgraded to newer versions on a regular schedule.

The cost of making a backward incompatible change at this point is too high for us to take lightly, and we are going to try very hard to avoid it. The move from 3.5 to 4.x was a one time thing that we had to do in order to continue evolving the product, not something that we plan again anytime soon.

We are also offering migration services for clients who want to move their applications from 3.x to 4.x.

time to read 1 min | 180 words

We were asked about best practices for managing the RavenDB session (unit of work) in a .NET Core MVC application. I thought it is interesting enough to warrant its own post.

RavenDB’s client API is divided into the Document Store, which holds the overall configuration required to access a RavenDB and the Document Session, which is a short lived object implementing Unit of Work pattern and typically only used for a single request.

We’ll start by adding the RavenDB configuration to the appsettings.json file, like so:

We bind it to the following strongly typed configuration class:

The last thing to do it to register this with the container for dependency injection purposes:

We register both the Document Store and the Document Session in the container, but note that the session is registered on a scoped mode, so each connection will get a new session.

Finally, let’s make actual use of the session in a controller:

Note that we used to recommend having SaveChangesAsync run for you automatically, but at this time, I think it is probably better to do this explicitly.

time to read 2 min | 368 words

imageLet’s consider the following data (which is actually RavenDB’s sample database). We have a collection of employees, and each one of them have an attachment with the employee’s photo. We want to display a table of the employees as well as the employees’ photos.

The problem is how to do that, exactly. One way of doing that is to loop over the employees, get the relevant attachments and send them all to the client for display. That works, but there are much better ways to go about doing this.

Instead of doing everything ourselves, we can rely on RavenDB and the browser to do things for us. Let’s look at the metadata we have for the employee in question, to see how RavenDB exposes the attachments to us:

image

This is interesting, because the hash means that we can do some interesting stuff. Instead of loading the attachment directly, we’ll create an endpoint that will provide access to the attachment, like so:

GET /employees/photos?id=employees/7-A&name=photo.jpg&hash=97S5UrejdZqHfel4i+/ts5orhNlp92DItxOUVow0maI=

So far, this just looks like we moved the data around, for no good reason. Instead of loading the attachments for the employees and sending them in one roundtrip to the client, we now force the client to generate N requests, to the number of employees we have. Surely that is much worse, no?

The key here is that the endpoint that we expose is going to use the Cache-Control header to ask the browser to cache this request for us forever. Because we have the hash of the file, we know that if we updated the employee’s photo, we will get a new hash, so we don’t need to deal with cache control issues.

By making the browser cache the value, we can significantly speed up the system. Now showing the employee photo is much cheaper.

There is also another advantage, the browser will typically use multiple connections to get the data (either actual multiple TCP connections or multiple streams in HTTP2), so we get additional benefits from this level of parallelization.

time to read 5 min | 946 words

imageThis story started a few years ago, in a very non technical setting. We changed the accountant that we use for Hibernating Rhinos. We outgrew the office we were using at the time and needed better services. Among the changes that was implemented as a result of this move was the usage of new accounting software. Nothing really that interesting, to be frank. I like that my accounting is boring. However, the new accounting software was an on-premise solution. In other words, we are the one running it. Which is perfectly fine, we provisioned a VM in our data center (a fancy name to the single rack that we had at the time) and let it run.

As you can imagine, we consider our accounting data to be mission critical, so to speak. I don’t mind not being able to access it for an hour, for example, but losing it is going to be Bad. So we had a backup, nothing really that interesting. We have a backup that goes to local disk on the VM, remote disk in the office and just to be safe, uploading the backup to S3. I asked one of our developers to take care of this, and aside from specifying that I want backups in triplicates, I didn’t really pay attention. That was around 2017, I believe.  I made sure that if the backup failed, we would get notified of that, and that was pretty much it.

One of the reasons that I like my accounting boring is that it simplify my life and reduces stress. Unfortunately, it seems like my accounting practices has a cost. In particular, it means that I favor paying a bit too much to the taxman. That means that all of the taxes are going out immediately, and the company doesn’t end up the year with a large tax bill that we need to cover. But I overdid it a time or two, and we overpaid on our taxes. Well, that was by design, extra money showing up from the taxman is much better than a surprise bill. But at certain point, we were supposed to get a refund for a non trivial amount. At which point the tax authorities came a-calling and audited us.

Remember that I talked about boring accounting practices. The day we started the audit, I was having dinner with my wide and being audited was the third topic of the day, if I recall properly. They found a few things that we did wrong (we registered an invoice for the wrong currency, so we cancelled it and issued a new one, instead of refunding it and issuing a new one). That was a Thing, it seemed. But the end result was pretty much nothing. I loved it. Since then, we were audited a few more times, always with no repercussions.

Given that the next audit is a question of when (usually every 18 – 30 months or so, it seems), not if. I really care about my accounting data. Hence the triple backups policy. You might have been going through this post expecting to hear that we lost the accounting data, and the backup failed, and now my accountant outlook is decidedly not boring. I’m afraid that this is only half true. We did have a failed backup, but we caught it before we actually needed it.

At one point, I looked at out backup policies, and I noticed that the accounting backup was months old at this point. That was concerning, I gotta say. Here is the timeline, as I could piece is together:

  • Q2 2017 – Backup process is defined and tested. This is a one off process that we use only for the accounting database.
  • Q1 2018 – Routine key rotation is performed on some of our keys. Unbeknownst to us, the backup process lose the ability to report failure. But given that it doesn’t fail, no one notices.
  • Q4 2018 – The developer responsible for setting up the backup process leave the company. As part of the outgoing employee process, we shut down relevant user accounts.
  • Q1 2019 – The accounting server is rebooted. The backup process fails to start, because the user account is disabled.

You might notice the scale of this issue. The underlying problem was that the developer setup this one off process as a… well, one off process. That meant that it wasn’t hooked to any of our usual monitoring / alert systems. It did have a way to report on errors, but the credentials on that went stale after a year. No one paid attention, since the backups continued to run.

The backup process was also running under the user account of the developer, not a service account. I guess it was easier than creating a user, but the end result was that when we deactivated the user account after the developer left the company, we also disabled the backup. But the process was running, and it continued to run for months.  Only much later will the process fail to start, and by then there was no way to report errors, and we noticed it only because we looked for that during routine operations.

One of the reasons we had built backups directly into the core of RavenDB was exactly this sort of situations. A backup process is not something that you cobble together (that’s on us, to be fair), it is something that should be part and parcel of the operations of your database, and being able to do something like get backups in triplicate is essential for good operations experience.

time to read 5 min | 911 words

Recently I had the chance to work on what one could term a “business app”. After a very long time dealing with system level software, I got my hands dirty when writing business level code. You know the kind, logging in a user, showing some data on a page, etc. I have been doing that for a long time, but the past few years I was mostly dealing with storage engines, distributed systems and the like. Even though I’m writing both kind of systems in the same environment, the feeling is quite different.

This is a stream of consciousness type post.

With the business app code, I was using controllers and services that are dynamically composed via dependency injection. For system level code, I have manual dependency management. The business code tends to be fairly short until it hits the database, but the system code tend to do a lot more inline.

A feature in both system is composed of UI, data and behavior, but the way they are structured is very different. For that matter, the way we build them is very different.

For example, the business code accepts an order from a user by writing it to the database, another component in the same system waits for such events and start processing it in an asynchronous manner. This meant that we had a pretty good separation between the different parts. To the point where we pretty much built them in isolation and concurrently. The UI team was generally much faster, so they threw commands at the backend and had something that marked them as completed while the backend team (a hilarious term from our usual perspective, to be frank) worked on accepting the commands and actually implementing the functionality. When writing system code, we typically write the actual implementation first, and figure out what we want from the UI afterward. Sometimes the UI comes a few weeks or months after the code has already been written and merged.

The rate in which features got completed was also astounding. Some of them were minor stuff (this URL shouldn’t have a line break) but even major features got done much faster than I’m used to. Although, to be fair, implementing something such as “optimize I/O writes on Linux 32 bits” vs. “send an email when the user attempts to login but doesn’t actually have an account” are of very different ranks.

Along the same path, the capability for concurrent work was much higher. We could work on different parts of the app with a much reduced chance for conflicts and stepping on each other toes. Even when we were working on roughly the same areas.

Readability and maintainability matters a lot more in business software. Performance trumps those when dealing with system software. That isn’t to say that perf isn’t important for business software, but we go so much added capacity for the things we want to do, it doesn’t usually matter.

I can’t write business level software without ReSharper, I can write system code without it, though.

JavaScript sucks regardless of the project type. There is something deeply wrong in the fact that building my JS based UI takes longer than it takes to compile my actual application.

There are a lot of things that are the same, of course. But probably the most important factor that I have to note is that sensitivity to pain.

What do I mean by that? For example, how fast can you go from hitting F5 to debugging your current issue? How much time does it take you to create a new thing and use it?

When using dependency injection, if you aren’t setting up automatic discovery, you have a recurring pitfall. Every time you add something new, you have to remember to register it. If you do have automatic discovery, you need to be clear what the conventions are. It can seem like magic, and it is easy to lose that  knowledge. Let’s take the command execution as a good example. Once you have a command in the system, debugging it means running F5 and stepping through the code.  If you need to make a change, go ahead and do that, hit F5 again, and you are back in the same location. As an added bonus, this also ensures that your commands are idempotent, since you are re-running all the time while debugging.

The key is that you need to be able to hit F5 and get there. We initially had a setup where you had to run the app from the command line, attach the debugger (manually!) go do something in the UI, and then can debug what you were doing. Not a big deal, if you are doing that once in a blue moon. But during active development? That is horrendous for productivity. I couldn’t stand it, and it was the very first thing that I tackled. It only shaved about 20 – 30 seconds from the launch time, but it had a big impact on the way I approached things.

Because I didn’t have to do any work to get back to the debugging mindset, I found myself working in a very different manner. I would make a change, run it, make a change, etc. When I had to do (a bit) more work, I had a much more careful process. And that slowed things down.

I forgot how much fun you can have when working with business level software, because the challenges you face are so very different.

time to read 3 min | 414 words

imageI was writing code, in the zone, slinging features around and in general having a great time. I was able to create the structure that I wanted, and things worked. So I started to do another pass on the code, to make sure that we go the error handling right.

I ended up writing code similar to the following:

As you can see, spawning of dinos can fail. Maybe there aren’t enough resources, maybe I have reached the limits of this particular type of dino. Regardless of what exactly was failing, we would try a bunch of options before giving up. That way, we can be sure that the usual suspects has already been accounted for.

This code failed with a strange error: “Cannot change the horn VIN: ‘horn-18238a8as81’”. 

I’m not setting the horn VIN anywhere, I’m using the high level API to spawn dinos, why am I seeing things in this manner. I debugged this, but I couldn’t figure out what was going on. I run the code on a separate project, and it worked. I rules out web app vs. console app, I decompiled the code and looked at things, I sniffed the network and looked into what was going on. I was completely lost.

At that point, I called another dev over to look at things and tried to explain what was going on. As I was going on and on about all the things that I tried and how much I hate the Jurassic period and how the Triassic has a much better API, he gently coughed and asked me if I’m not missing a break here.

I actually looked at the code and wanted to cry. My code did the exact thing that I asked it to. If I was able to successfully spawn a dino, I would go ahead and create another one, with the same name. It looks like deep in the guts of the system, there were assignments of ids to the various pieces, and trying to create an instance with the same name caused this error.

Cue head on keyboard a few times, a single statement added and everything worked.

I literally spent a few hours on this thing, which is really embarrassing. I would probably spend a lot more if I didn’t have a fresh set of eyes come and point it out.

RavenDB Cloud

time to read 4 min | 763 words

image

TLDR

RavenDB now offers cloud hosting for RavenDB clusters. Manage your data with this awesome solution built by the RavenDB team. Access through cloud.ravendb.net.

A free option is available.

When I wrote the first few lines of code for RavenDB over a decade ago, Amazon Web Services still had the beta label on it and deploying to production meant a server in the basement. The landscape for server-side software has changed considerably. Nowadays you have to justify not running on the cloud. Originally, RavenDB’s features were driven by the kind of systems and setup found in a typical corporate data center. Now a lot of our features are directly impacted by the operating environment in the various cloud platforms.

The same team that develops RavenDB itself now offers a database as a service solution (DBaaS) that can be found at cloud.ravendb.net. Our service offering is available on all Amazon Web Services and Microsoft Azure regions, with Google Cloud Platform soon to follow.

What do you get?

You get a fully managed service. We take care of all backend chores of your databases while you focus on building your applications to deliver even more value to your business.

We have done everything possible to make sure that the only task you’ll need to do is come up with the data to put into RavenDB Cloud. Tasks such as monitoring, updating or managing the system, and even creating a default backup task per database, are the responsibility of our team and are handled without a fuss behind the scenes.

The DBaaS package includes encryption over the wire as well as encryption at rest -- you can also deploy encrypted databases with your own encryption keys.

The idea of dynamic scaling in managed systems has been a core tenant for cloud architecture, and RavenDB Cloud fits right into this model. In just a few clicks you can provision a cluster, deploy it to anywhere in the world, and start working. If you need more capacity, a simple click will provide you with more resources -- without your code or your customers even being aware.

If you have a Black Friday event or a special discount day coming, you can scale up your system ahead of time. If during that day you are pleasantly surprised with more than anticipated activity, you can power through the spike by scaling immediately and then reduce the capacity back to normal levels once the peak is traversed.

But RavenDB Cloud is not just about reducing the overhead of running databases -- we built RavenDB Cloud to save you money. As a highly tuned system, you can manage your load on fewer resources, which also translates into more savings down the line.

Pricing Model

Our cloud offering has an on-demand subscription as well as discounts for yearly contracts. A 10% introductory discount is now available, lasting to the rest of this year.

Some hosted databases solutions charge you per request or per maximum utilized capacity. Such solutions are complex to understand when it comes to billing time. I intensely dislike complexity -- especially when it comes to bills! Price predictability is important to us. With RavenDB Cloud, you pay for your resources at a flat and known rate to make sure there won’t be any surprises at the end of the month.

The RavenDB instances can be configured a-la-carte, according to your needs. A tailored solution with geo distributed clusters, support for on-premise & cloud integration, widely distributed deployments and custom instance types is only an email away.

RavenDB Cloud has several tiers of clusters available. If you are running a small to medium sized application, you can go with our basic instances and enjoy a reduced cost. For more demanding workloads, you can use more performant instances that have full access to the cloud resources to get maximum performance.

RavenDB Cloud also has a completely free option. Go to cloud.ravendb.net and select the free option. You’ll have your own secured, managed and hassle-free instance of RavenDB in moments. Go ahead and try it out right now.

What about RavenHQ?

RavenHQ has been providing managed RavenDB instances since 2012. It will continue offering RavenDB hosting for version 3.5 and earlier. RavenDB Cloud will provide managed clusters for RavenDB 4.2 and up.

Migrating from a RavenDB 4.x instance hosted on RavenHQ to RavenDB Cloud is simple and can be done in just a few clicks.

Not only will migrating to RavenDB Cloud not cost you anything extra, for most cases your new managed database service will be cheaper than what you had before.

time to read 6 min | 1051 words

A process running on your system is typically a black box. You don’t have a lot of insight into what is going on inside it. Oh, there are all sort of tools you can use to infer things out (looking at system calls, memory consumption, network connections, etc), but by default… it is a mystery.

RavenDB is a database. It is meant to run unattended for long durations and is designed to mostly run itself. That means that when you look at it, you want to be able to figure out exactly what is going on with the system as soon as possible. To that end, we have included a lot of features inside RavenDB that expose the internal state of the system. From tracing each I/O and its duration to providing detailed statistics about costs and amount of effort invested in various tasks.

These features are invaluable to figure out exactly what is going on in RavenDB at a particular point in time. Of course, nothing beats the ability to open a debugger and inspect the state of the system. But that is something that you can only really do on development. It is not something that can be done on production, obviously. Or can it?

Since RavenDB 3.0, we actually had just this feature, being able to ask RavenDB to capture and display its own state in a format that should be very familiar to developers. When we created RavenDB 4.0, we were able to carry on this feature on Windows (at some cost), but it was a complete non starter on Linux.

On Windows, a process can debug another process if they belong to the same user (somewhat of an over simplification, but good enough). On Linux, the situation is a lot more complex. A process can usually only debug another process if the debugger is running as root or is the parent process of the debugee process.

Another complication was that we are using ClrMD, a wonderful library that allow us to introspect live processes (among many other things). It did not have support for Linux, until about a month ago… as soon as we had the most basic of support there, we jumped into action, seeing how we can bring this feature to Linux as well. A lot of our users are running production systems on Linux, and the ability to look at the system and go: “Hmm. I wonder what this is doing” and then being able to tell is something that we consider a major boost to RavenDB.

It took a lot of fighting and learning a lot more about how debugging permissions work on Linux than I ever wanted to know. But we got it working (details below). You can see how this looks like on a live Linux server:

image

As you can see, there is an indexing thread here doing some work on spatial data. We are going to enhance this view further with the ability to see CPU times as well as job names. The idea is that this is something that you will look at and get enough insight to not need to check the logs or try to infer what is going on. You could just tell.

Now, for the gory details of how this works. We changed the implementation on both Windows and Linux to use passive attach to process, which is much faster. The first thing we tried, once we moved to passive attach is to debug ourselves.

This is a nice enough feature, and quite elegant. We debug ourselves, pull the stack traces and display the data. Unfortunately, this doesn’t work on Linux. A process cannot debug itself. All debugging in Linux is based on the ptrace() system call, and the permissions to that are as specified. I can’t imagine the security implications of letting a process debug itself. After all, it is already can do anything the process can do, because it is the process. But I guess that this is an esoteric enough scenario that no one noticed and then the reaction was, use a workaround.

The usual workaround is to have a process that would spawn RavenDB and then it would be able to debug it. That is… possible, but it would be a major shift in how we deploy, not something that I wanted to do. There is also the ptrace_scope flag, which is supposed to control this behavior. In my tests, at least, disabling the security checks via this flag did absolutely nothing.

Running as root worked, just fine, of course. And then the process crashed. On Linux, when trying to debug your own process, there seems to be an interesting interaction between the debugger and debuggee if an exception is thrown. To the point where it will corrupt the CoreCLR state and kill the process. That was a fun bug to trace, sort of. Linux has a escape hatch in the form of PR_SET_PTRACER option that can be used. However, you can’t designate your own process, unfortunately. That combined with the hard crashes, made self debugging a non starter.

But I still want this feature, and without changing too much about how we are doing things.

Here is what we ended up doing. We have a separate process just to capture the stack trace. When you ask RavenDB to get its stack trace, it will spawn this process, but ask it to wait. It will then grant the new process the permissions necessary to debug RavenDB and signal it to continue. At this point, the debugger child process will capture the stack trace and send it back to RavenDB. RavenDB will reset the permission, enhance the stack trace with additional information that we can provide from inside the process and display it to the user.

The actual debugger process is also marked with setcap to provide it additional permissions it needs. This separation means that we isolate these permissions to a single purpose tool that can be invoked and closed, without increasing the attack surface of RavenDB.

The end result is that you can walk to a production RavenDB server, running on Windows or Linux, and get better information than if you just attached to it with the debugger.

time to read 1 min | 173 words

imageThis hits my email, and given the number of questions that our support team fields on the topic, I wanted to make sure it is widely known. You can now use RavenDB 4.2 as the backing store for your NServiceBus systems.

That, in turn, means that you can now use RavenDB 4.2 in your systems in general, which is going to be a much nicer experience overall and put you back on the supported path.

A note about support:

  • RavenDB 3.0 is no longer supported.
  • RavenDB 3.5 is supported until the end of 2020.

Many of the users who use RavenDB with NServiceBus tend to be larger enterprises, where updates to the technology stack may take a while. So it is better to start these things early.

And as an added incentive, based on over 18 months in the field, just by moving to RavenDB 4.2 you are going to get ten fold performance increase from RavenDB 3.5.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production postmortem (26):
    07 Jun 2019 - Printer out of paper and the RavenDB hang
  2. Reviewing Sled (3):
    23 Apr 2019 - Part III
  3. RavenDB 4.2 Features (5):
    21 Mar 2019 - Diffing revisions
  4. Workflow design (4):
    06 Mar 2019 - Making the business people happy
  5. Data modeling with indexes (6):
    22 Feb 2019 - Event sourcing–Part III–time sensitive data
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats