Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:


+972 52-548-6969

, @ Q j

Posts: 6,738 | Comments: 48,776

filter by tags archive

Analyzing the GitHub outage

time to read 8 min | 1516 words

imageA couple of weeks ago, GitHUb had a major outage, lasting over 24 hours and resulted in wide spread disruption of many operations for customers. A few days after everything was fixed, they posted their analysis on what happened, which makes for a really good read.

The pebble that started all of this was a connection disruption that lasted 43 seconds(!). A couple of months ago I talked about people who say that you can assume that distributed failures are no longer meaningful. The real world will keep serving up examples of weird / strange / nasty stuff to your productions systems, and you need to handle that. Quoting from the original post:

Therefore: the question becomes: how much availability is lost when we guarantee consistency? In practice, the answer is very little. Systems that guarantee consistency only experience a necessary reduction in availability in the event of a network partition. As networks become more redundant, partitions become an increasingly rare event. And even if there is a partition, it is still possible for the majority partition to be available. Only the minority partition must become unavailable. Therefore, for the reduction in availability to be perceived, there must be both a network partition, and also clients that are able to communicate with the nodes in the minority partition (and not the majority partition). This combination of events is typically rarer than other causes of system unavailability.

So no, not really. There is a good point here on the fact only the minority portion of the system must become unavailable, but given typical production deployment, any disconnect between data centers will cause a minority portion to be visible to clients and become unavailable.

The actual GitHub issues that are discussed in the post are a lot more interesting. First, we have the obvious problem that most applications assume that their database access is fast and they make multiple such calls during the processing of a single request (sometimes, many calls). This is just another example of the Fallacies of Distributed Computing in action. RavenDB has a builtin detection for that and a host of features that allow you to go to the database server once, instead of multiple times. In such a case, even if you need to failover to a remote server, you won’t pay the roundtrip costs multiple times.

However, this is such a common problem that I don’t think that it deserve much attention. There isn’t much that you can do about it without careful consideration and support from the whole stack. Usually, this happens on projects when you have a strong leader that institute a performance budget and enforce that. This has costs of its own and usually it is cheaper to just not failover across data center boundaries.

The next part that I find really interesting is that the system that GitHub uses for managing topologies is not consistent but is required to be. The problem is that there is an inherent delay between their orchestrator re-organizing the cluster after a failure and when the failure actually occurs. That would have been fine, if they had a way to successfully merge histories, but that is not the case. In fact, looking at just the information that they have published (and ignoring that I have the benefit of hindsight) the issue is glaringly obvious.

A deep dive (and a fascinating read) into how GitHub handles high availability talks about the underlying details and expose the root cause. You cannot layer distinct distributed architectures on top of one another and expect to come up with a good result. Here is what happens in a master crash scenario:

In a master crash scenario:

  • The orchestrator nodes detect failures.
  • The orchestrator/raft leader kicks off a recovery. A new master gets promoted.
  • orchestrator/raft advertises the master change to all raft cluster nodes.
  • Each orchestrator/raft member receives a leader change notification. They each update the local Consul’s KV store with the identity of the new master.
  • Each GLB/HAProxy has consul-template running, which observes the change in Consul’s KV store, and reconfigures and reloads HAProxy.
  • Client traffic gets redirected to the new master.

I read this and feel a bit queasy, because the master crash scenario is not the interesting bit. That is the easy part. The really hard part is how you manage things when you have a network disruption, with both sides still up and functioning. In fact, that is exactly what happened to GitHub. In this case, on the minority side, their orchestrator cannot get a majority (so cannot make any forward process). However, the rest of the system cannot proceed, the whole thing stops at either the first or second stage.

That means that the rest of the system will continue to write to the old master, resulting in a conflict. And this is where things gets complicated. The issue here is that with MySQL (and most other systems that relies on log replication) you must have a single master at any given time. That is an absolute requirement. If you got to the point where you had two writes with divergent histories, you are in for selecting which one you’ll accept (and what data you’ll discard) and trying to manually fix things after the fact.

The proper way to handle something like this would have been to use Raft to actually send the commands themselves to the server. This ensures a consistent set of statements that run in the same order for all servers. Rqlite is a great example of this, where you can get consistent and distributed system on top of individual components. That would be the proper way to do it, mind, not the way anyone would do it.

You wouldn’t be able to get any reasonable performance from the system using this kind of approach. Rqlite, for example, talks about being able to get 10 – 200 operations per second. I’m going to assume that GitHub has a need for something better than that. So the underlying distributed architecture looks like this:

  • MySQL write master with multiple read-only secondaries using binlog.
  • Orchestrator to provide health monitoring and selection of new write primary (consistent using Raft)
  • Underlying infrastructure that uses (different) Raft to store routing configuration.

If you break Orchestrator’s ability to make decisions (easy, just create a partition), you take away the ability to change the write master, and if the failure mode you are dealing with is not a failed master (for example, you have partition) you are going to accept new writes to the old master.  That breaks completely the whole idea of binlog replication, of course, so you are sort of stuck at that point. In short, I think that Orchestrator is something that was meant to solve an entirely different problem, it was meant to deal with the failure of a single node, not to handle a full data center partition.

When looking at such incidents, I always compare to what would have happened if RavenDB was used instead. This is not really fair in this case because RavenDB was designed upfront to be a distributed database. RavenDB doesn’t really have the concept a a write master. For simplicity’s sake, we usually try to direct all writes to a single node for each database, because this simplify how you usually work. However, but any node can accept writes and will distribute it to the rest of the nodes in the cluster. In a situation like the one GitHub faced, both sides of the partition would keep accepting writes (just like happened in GitHub’s case with MySQL).

The difference is what will happen when the partition is healed. Both sides of the partition will update the other with the data that is missing on the other side. Any conflicting writes (by which I mean writes on both sides of the partition to the same document or documents) will be detected and resolved automatically. Automatic resolution is very important to keeping everything up and running. This can be a custom resolution policy defined by the user or arbitrary by RavenDB. Regardless of the conflict resolution policy, the administrator will be notified about the conflicts and can review the actions taken by RavenDB and decide what to do about that.

In GitHub’s case, their busiest cluster had less than a thousand writes in the time period in question. Most of which aren’t going to conflict. I would expect the timeline with RavenDB to be:

  • 2018 October 21 22:52 UTC – initial network partition, lasting 43 seconds
  • 2018 October 21 22:54 UTC – internal monitoring alert about any conflicts (but I consider this unlikely)
  • 2018 October 21 23:00 UTC – issue resolved, go home

The difference is mostly because RavenDB was designed to live in these kind of environment, deployed in multiple data centers and actually handling, in the real world and with very little assistance, the task of keeping applications up and running without blowing things up. It is quite literally one of the basic building blocks we have, so it shouldn’t be surprising that we are pretty good at it.

The fear of an empty source file

time to read 4 min | 606 words

imageI have been writing software at this point for over twenty years, and I want to believe that I have learned a few things during that timeframe.

And yet, probably the hardest thing for me is to start writing from scratch. If there is no code already there, it is all too easy to get lost in the details and not actually be able to get anywhere.

An empty source file is full of so many options, and any decision that I’ll make is going to have very long lasting impact. Sometimes I look at the keyboard and just freeze, unable to proceed because I know, with a 100% certainty, that whatever I’ll produce isn’t going to be up to my own standards. In fact, it is going to suck, for sure.

I think that about 90% of the things I have written so far are stuff that I couldn’t write today. Not because I lack the knowledge, but because I have far greater understanding of the problem space and I know that trying to solve it all is such a big task that it is not possible for me to do so. What I need reminding, sometimes, is that I have written those things, and eventually, those things were able to accomplish all that was required of them.

A painter doesn’t just start by throwing paint on canvas, and a building doesn’t grow up by people putting bricks where they feel like. In pretty much any profession, you need to iterate several times to get things actually done. With painters, you’ll typically do a drawing before actually putting paint on canvas. With architects will build a small scale model, etc.

For me, the hardest thing to do when I’m building something new is to actually allow myself to write it out as is. That means, lay out the general structure of the code, and ignore all the other stuff that you must have in order to get to real production worthy code. This means flat our ignoring:

  • Error handling
  • Control of allocations and memory used
  • Select the underlying data structures and algorithms
  • Yes, that means that O(N^2) is just fine for now
  • Logging, monitoring and visibility
  • Commenting and refactoring the code for maintainability over time

All of these are important, but I literally can’t pay these taxes and build something new in the same time.

I like to think about the way I work as old style rendering passes. When I’m done with the overall structure, I’ll go back and add these details. Sometimes that can be a lot of work, but at that point, I actually have something to help me. At a minimum, I have tests that verify that things still work and now I have a good understanding of the problem (and my solution) so I can approach things without having so many unknown to deal with.

A large part of that is that the fact that I didn’t pay any of the taxes for development. This usually means that the new thing is basically a ball of mud, but it is a small ball of mud, which means that if I need to change things around, I have to touch fewer moving parts. A lot fewer, actually. That allow me to explore, figure out what works and doesn’t.

It is also going directly against all of my instincts and can be really annoying. I really want to do a certain piece of code properly, but focusing on perfecting a single door knob means that the whole structure will never see the light of day.

Graphs in RavenDBWhat’s the role of the middle man?

time to read 2 min | 279 words

imageAn interesting challenge with implementing graph queries is that you sometimes get into situations where the correct behavior is counter intuitive.

Consider the case of the graph on the right and the following query:


This will return:

  • Source: Arava, Destination: Oscar

But what would be the value of the Edge property? The answer to that is… complicated.  What we actually return is the edge itself. Let’s see what I mean by that.


And, indeed, the value of Edge in this query is going to be dogs/oscar.


This isn’t very helpful if we are talking about a simple edge like this. After all, we can deduce this from the Src –> Destination pair. This gets more interesting when the edge is more complex. Consider the following query:


What do you this should be the output here? In this case, the edge isn’t the Product property, it is the specific line that match the filter on the edge. Here is what the result looks like:


As you can imagine, knowing exactly what edge led you from one document to another can be very useful when you look at the query results.

Memory management goop in Windows & Linux

time to read 2 min | 323 words

Regardless of the operating system you use, you are going to get roughly the same services from each of them. In particular, process and memory isolation, managing the hardware, etc. It can sometimes be really interesting to see the difference between the operating systems approach to solving the same problem. Case in point, how both Windows and Linux manage memory. Both of them run on the same hardware and do roughly the same thing. But they have very different styles, this end up having profound implications on the application using them.

Consider what appears to be a very simple question, what stuff do I have in my RAM? Linux keeps track of Resident Set Size on a per mapping basis, which means that we are able to figure out how much of a mmap file is actually in memory. Further more, we can figure out how of the mmap data is clean, which means that it is easily discardable and how much is dirty and needs to be written to disk. Linux exposes this information via the /proc/[pid]/smaps

On the other hand, Windows doesn’t seem to bother to do this tracking. You can get this information, but you need to ask it for each page individually. This means that it isn’t feasible to check what percentage of the system memory is clean (mmap pages that hasn’t been modified and can be cheaply discarded). Windows expose this via the QueryWorkingSetEx method.

As a result, we have to be more conservative on Windows when the system reports high memory usage. We know that our usage pattern means that high amount of memory in use (coming from mmap clean pages) is fine. It is a small detail, but it has caused us to have to jump through several hurdles when we are running under load. I guess that Windows doesn’t need this information, so it isn’t exposed, while on Linux it seems to be used by plenty of callers.

System flexibility

time to read 3 min | 540 words

One of the absolutely most challenging things in designing software systems is that there is really no such thing is a perfect world. A business requirement that is set in stone turns out to be quite malleable. That can cause quite a big hassle for the development team, as they try to anticipate and address all aspects of change ahead of time.

A better alternative would be to not attempt to address all such issues in software, but in wetware.

I recently ordered lunch to go at a restaurant. I already paid and was waiting to get my order when the clerk double checked with me that the order is to go. The ordered was entered as if I was going to eat in the location, instead of taking the food away. After I confirmed that I want to take my order to go, I watched how the clerk fixed things up. She went to the kitchen window and shouted, “that last order, make it to go”. The kitchen staff double checked which order it was, then moved on with their tasks, eventually hanging me a baggie of tasty food to go.

On the way back, I kept wondering in my head how a software system would handle something like this. You’ll need to shred the idea of “an order is immutable once it is paid”, for example. Or you’ll need to add a side channel of additional instructions to the kitchen, etc.

Or, you can ignore the whole thing completely and shout at the cook. In software, that might mean that we’ll keep ourselves agile and provide a “Manual” mode in which a user can enter free text / instructions for the next person on the line to process this.

There are some cases where this would be a bad idea, but mostly these are involved not trusting your users to do their jobs. Sometimes, it is literally the software’s job to force the users to follow a specific path (usually because management decided that this must be so). However, a really important aspect of design is that it isn’t rigid, it allows the user to do their work, instead of working around the software. Part of that involves designing specific places where users can do stuff that you didn’t think that they would need.

For example, in an order, having a “Notes” text field that is editable even after the order is placed, which can be used for further communication. The idea is that you spend just a little bit of time to consider whatever scenarios you didn’t cover and try to give the user something (just bare minimum, maybe even below that), just to allow them to get by. The idea isn’t to provide a solution, but to get something that give the user a choice and that will raise enough feedback so you can plug this into the next iteration of your product.

Not having anything may mean that the users will solve their own problem using something else (email, or even just talking directly to one another) and we can’t have that*, obviously.

* It may sound silly, but in some cases, you literally can’t have that. Certain actions need to be logged, authorized and track appropriately for many purposes.

The redux of the fallacies of distributed computing

time to read 4 min | 612 words

The fallacies of distributed computing is a topic that is very near and dear to my heart. These are a set of assertions describing false assumptions that distributed applications invariably make.

The first two are:

  • The network is reliable.
  • Latency is zero.

Whenever I talk about distributed computing, the fallacies come up. And they trip people up, over and over and over again. Even people who should know better.

Which is why I read this post with horror. That was mostly for the following quote:

As networks become more redundant, partitions become an increasingly rare event. And even if there is a partition, it is still possible for the majority partition to be available. Only the minority partition must become unavailable. Therefore, for the reduction in availability to be perceived, there must be both a network partition, and also clients that are able to communicate with the nodes in the minority partition (and not the majority partition).

Now, to be clear, Daniel literally has a PHD in CS and has published several papers on the topic. It is possible that he is speaking in very precise terms that don’t necessary match to the way I read this statement. But even so, I believe that this statement is absolutely and horribly wrong.

A network partition is rare, you say? This reading from 2014 paper for ACM Queue shows that this is anything but. Oh, sure, in the grand scheme of things, a network partition is an extremely rare event in a properly maintained data center, let’s say that this is a 1 / 500,000 chance for that happening (rough numbers from the Google Chubby paper). That still gives you 61 outages(!) in a few weeks.

Go and read the ACM paper, it makes for fascinating reading, in the same way you can’t look away from a horror movie however much you want to.

And this is talking just about network partitions. The problem is that from the perspective of the individual nodes, that is not nearly the only reason why you might get a partition:

  • If running a server using a managed platform, you might hit a stop the world GC collection event. In some cases, this can be minutes.
  • In an unmanaged language, your malloc() may be doing maintenance tasks and causing an unexpected block in a bad location.
  • You may be swapping to disk.
  • The OS might have decided to randomly kill your process (Linux OOM killer).
  • Your workload has hit some critical point (see the Expires section) and cause the server to wait a long time before it can reply.
  • Your server is on a VM that was moved between physical machines.
  • A certificate expired on one machine, but not on others, meaning that it can contact others, but cannot be contacted directly (except that already existing connections still work).

All of these are before we consider the fact that we are dealing with imperfect software and that there may be bugs, that humans are tinkering with the system (such as deploying a new version) and mess things up, etc.

So no, I utterly reject the idea that partitions are rare events in any meaningful manner. Sure, they are rare, but a million to one event? We can do million packets per second. That means that something that is incredibly rare can still happen multiple times a day. In practice, you need to be aware that your software will be running in a partition, and that you will need a way to handle that.

And go read the fallacies again, maybe print them and stick them on a wall somewhere near by. If you are working with a distributed system, it is important to remember these fallacies, because they will trip you up.

Transactional Patterns: Conversation vs. Batch

time to read 6 min | 1136 words

When I designed RavenDB, I had a very particular use case at the forefront of my mind. That scenario was a business application talking to a database, usually as a web application.

These kind of applications have a particular style of communication with the user. As you can see below, there are two very distinct operations. Show the user the data, followed by some “think time” (seconds at minimum, but can be much longer) and then followed by an action.


This shouldn’t really be a surprised for anyone who developed any kind of application for the last decade or two, so why do I mention this explicitly?  I mention this because of the nature of communication between the application and the database.

Some databases have a the conversation pattern with the application. In terms of API, this will look something like this:

  • BeginTransaction()
  • Update()
  • Insert()
  • Commit()

This is a very natural model and should be quite familiar for most developers. The other alternative to this method is to use batches:

  • SaveChanges( [Update, Insert] )

I want to use this post to talk about the difference between the two styles and how that impacts your work. Relational databases uses the conversation style while RavenDB uses batch style. On the surface, it looks like it would be a more complex to use RavenDB to achieve the same task, but there is very little difference in the API as far as the user is concerned. In both cases, the code looks very much the same:

Behind the scenes, however, the RavenDB code will send just a single request to the server, while a relational database will need four separate commands to execute the transaction. In many cases, you can send all of these commands to the server in a single roundtrips, but that is an optimization that doesn’t always work and often isn’t applied even when it is possible.

Sidebar: Reducing server roundtrips

Why is the reduction in server roundtrips so important? Because it has a lot of implications on the overall performance of the system. In many cases the cost of making a remote query from the application to the database far outstrips the costs of actually executing the query. This ties closely to the Fallacies of Distributed Computing. Latency isn’t zero, even though when you develop locally it certainly seems like this is the case.

The primary goal of this design in RavenDB was to reduce the number of network roundtrips that your application must endure. Because in the vast majority of the cases, your application is going to follow the “show data” / “modify data” as two separate operations (often separated by a long idle time) there is a lot of value in having the database interaction model match what you will actually be doing.

As it turned out, there are some additional advantages (and disadvantages, which I’ll cover a bit later) to this approach, beyond just the obvious reduction in the number of server roundtrips.

When the server gets all the operations that needs to be done in a single request, it can apply all of them at once. For that matter, it can chose how to apply them in the most optimal order. This gives the database server a lot more chances for optimization. It is similar to going to the supermarket with a list of items to purchase vs. a treasure hunt. When you have the full list, you can decide to pick things up based on how close they are on the shelves. If you only get the next instruction after you complete the previous one, you have no option for optimization.

When using the conversation style, durability and state management become more complex as well. Relational databases typically use some variation of ARIES for their journals. This is because they need to record information about ongoing transactions that haven’t yet been committed. This add significant complexity to the amount of work that is required from the database engine. Furthermore, when running in a distributed system, you need to share this transaction state (which hasn’t yet been committed!) across the nodes to allow failover of the transaction if the server fails. With the conversation style, you need to support concurrent transactions all operating at the same time and potentially reading and modifying the same data. This lead to a great deal of code that is required to properly manage locking and latching inside the database engine.

On the other hand, batch mode give the server all the operations in the transaction in a single go. This means that failover can simply be sending the batch of operations to another node, without the need to share complex state between them. It means that the database server has all the required information and can make decisions based on it. For example, if there are no data dependencies, it can execute the operations in the transaction in whatever order it desires, leading to more optimal execution time. The database can also mix & match operations from different transactions into a single batch (as long as it keeps the externally visible behavior consistent, of course) to optimize things even further.

There are two major disadvantages for batch mode. The first of which is that there is usually a strict separation of reads from writes. That means that you usually can’t get a single consistent read/modify operation that stay in the same transaction. The second issue is similar, because you need to generate all the operations ahead of time, you can’t make decisions about what operations to execute based on the data you read, at least not in the same transaction. The typical solution for that is to send a script in the batch. This script can then read / modify data in the same context, apply logic, etc. The important thing here is that this script runs inside the server, already inside the transaction. This means that you don’t pay network round trips time to make such operations.

On the other hand, it means that you need to write potentially complex logic in the database’s scripting language, rather than your own platform, which you’ll likely prefer.

Luckily, for most scenarios, especially with web applications, you don’t need to execute complex logics on the server side. You can usually just send the commands you need in a single batch and be done with it. Often, just have optimistic concurrency is enough to get you the consistency you want, with scripting reserved for more exceptional cases.

RavenDB’s usage scenario was meant to make the common operations easy and the hard stuff possible. I think that we got it right and ended up with an API that is functional, highly performant and one that has withstood the test of time very well.

The iterative design process: Query parameters example

time to read 4 min | 660 words

When we start building a feature, we often have a pretty good idea of what we want to have and how to get there. And then we actually start building it and we often end up with something that is quite different (and usually much better). It has gotten to the point where we aren’t even trying to do hard specs and detailed design at anything beyond the exploratory levels. For example, in the design of RavenDB 4.0, there was not even a mention of RQL. That ended up being a very late addition to the codebase, but it improved RavenDB significantly. On the other hand, the low level mechanisms of zero copy documents from Voron all the way to the network were designed up front, but only at a fairly high level.

In this post, I want to talk about query parameters in RavenDB. Actually, let me be more specific, we have query parameters, but what we don’t have (or rather, didn’t have, because that will be merged in by the time you read this post) is the ability to run parameterized queries from the studio. We always meant to have that capability, but we run out of time with the 4.0 release. As we are gearing up to the 4.1 release, we are cleaning the table from the major-minor issues. (Major in term of impact, minor in term of amount of work required). The query parameters in the studio is one such example. Here is what this looks like:


My first thought was to just build something like this:


Give the user the ability to define arguments and be done with it. The task was assigned to one of our developers and I expected to get a PR in a short while.

This particular developer has a tendency to consider not just the task at hand but also other aspects of the problem. He didn’t want the user to have to manually specify each argument, since that has poor ergonomics. Instead, he wanted the studio to figure it out its own and help the user. So the first thing he did was detect the arguments (regex: “\$\w+”) and present them in the grid. Then there was the issue of how to deal with edits, etc. Then he run into another problem, types. Query parameters can be more than just strings, they can be any JSON data type.

Here is what he came up with:


Instead of having to define the query parameters in a separate location, just put them right in. Having the parameters grid involves pointing and clicking with the mouse, entering possibly complex values (such as long arrays) and in general much more work than just having them right above the query.

Note that this is a studio only feature, queries from the client API already have ways to specify arguments properly. So the next question is how we are going to handle passing the arguments to the server. Remember, this is only on the studio, so we can take quite a few shortcuts. In this case, we’ll simply snip the entire first section of the query text (which contains the query parameters). We can do that by going from the start of the query to the first from or declare keywords. We do a basic pre-processing to turn “$name = …“ into “results.$name = …“ and then just execute this code in the browser, giving us a JS object with all the parameters that we can then send to the servers.

The next stage is to make this discoverable, by detecting parameters whose value is not provided and giving the user a quick fix to add them.

Dealing with massively distributed data flows

time to read 4 min | 610 words

imageImagine that you are the owner of Gary’s Shoes, and that you want to get data from all of your multitudes of stores into a centralized location. You’ll use that data to make decisions, predict future trends, etc. Given that each store must operate independently, you have a server in each location that will push up it changes (and get updates from) the HQ cluster. You can see an example of this kind of setup in this post.

This work quite well, but it does require the user to be aware of a potential issue. When you have a massively distributed data flow process setup, you need to also pay attention for the quite in the noise. What do I mean by that?

One of our customers have RavenDB deployed to tens of thousands of locations worldwide. At any given time, you are going to have at least some of those locations unavailable. In some locations, part of closing down for the day means literally flipping the master switch on electricity for the entire building. On others, you might have someone tripping over the router or have some local or regional network outage.

Part of the strategy for dealing with such a data set, coming from so many separate locations, is the need to monitor when we aren’t getting data. The fact that on most of our locations we have near real time data is very powerful for the business. But you also need to see where you aren’t getting the data from and setup proper alerts and monitoring for the missing data. From a business perspective, it is also advisable to surface that kind of detail all the way to the user. If you are going to be ordering inventory for the stores in a particular state, but the two major stores in the area are down because of a network issue and has been down for two days now, you want to be aware of that and figure out that you are working on out of date data.

To be honest, the issues isn’t so much about two days of lag in the case of once in blue moon type of error. In the scenario outlined above, in pretty much all business scenarios that I can think of, you won’t really see any impact on the decision making of the organization.

The killer is when you have some sort of a problem that goes on for a while. A DNS update that was missed because of bad DNS cache policy, for example. Now your updates to HQ go into the void in a consistent basis. On the other hand, everything else continue to function properly both locally and for HQ. If this isn’t accounted for, it is easy to miss this for a long period of time. I have seen such a case that was only discovered when the year’s end numbers didn’t quite match up what they were supposed to. Given that this was the second year in a row this happened, the investigation found that some network issue indeed cause a very long term topology failure. This was actually properly reported, in a log file that no one ever read.

Lesson learned, make sure that part of your data flow strategy accounts for such things and bring them to the users’ attention. Actually resolving the issue was a network configuration change that took minutes and the entire dataset was synchronized within a few hours afterward. But finding out that there was even a problem took effectively forever.

Unexpected use cases for RavenDB in IoT

time to read 3 min | 558 words

imageWe designed RavenDB to be a server side database, to be used to run large scale business applications. Surprisingly for us, there is a large group of users that have taken RavenDB and actually run it as part of their deployed systems. In other words, instead of having a single large RavenDB cluster they will typically deploy many (hundreds in the small cases, tens of thousands to millions in the large cases) of RavenDB instances across a wide variety of locations.

Part of that is the fact that RavenDB can be embedded inside an application quite easily. That means that we don’t need complex setup or administration. You can just use RavenDB from your application and everything Will Just Work. Another factor is the fact that you can run RavenDB on very low end machines, including 32 bits machines, ARM SoC, etc.

One use case was a point of sales system that had to spec out their hardware a decade in advanced and had to deal with existing installations that were still running hardware from 10 years ago (with little desire to upgrade). Another use case was deploying RavenDB as part of an industrial robot package, with RavenDB installed on a 32 bits ARM system on chip that control the robot.

That kind of deployment pattern lead to interesting requests. For example, several of our customers need ad hoc replication in a location. So all the nodes in a particular physical location will join together to a full mesh of replicated nodes. This gives us high availability in a particular location with any node in the network being able to service any request across the entire location. Boot up a new machine, wait a bit for the rest of the network to update it and you are good to go. This also helps when you consider your machines to be unreliable (because they are old, beaten down and generally minimally maintained).

Another scenario with the need for dynamic topologies is the deployment of RavenDB as set of independent nodes that need to report to some sort of head quarters. This is easy to do by defining external replication or ETL on the node and have it send all the relevant data to a central location for processing. This way, you get a cheap “always available” local node but can still have a global view of your data. I posted about something similar in the past, if you care for the details.

We are now looking for additional features to serve this kind of deployment. In particular, we are interested in making it easy to share data and generate analytics across widely distributed and separated set of instances. One of things that we are currently considering is some form of integration with the cloud. For example, consider Amazon Athena, which allow you to run analytics queries on files residing in S3. We can define ETL processes that would upload the data from RavenDB as it is changed on each individual node. This way, you have each node pushing data to the cloud and a central location that can run live analytics on the data.

What are your thoughts on this? And what other features do you think will serve this kind of scenario?


No future posts left, oh my!


  1. Graphs in RavenDB (11):
    08 Nov 2018 - Real world use cases
  2. Challenge (54):
    28 Sep 2018 - The loop that leaks–Answer
  3. Reviewing FASTER (9):
    06 Sep 2018 - Summary
  4. RavenDB 4.1 features (12):
    22 Aug 2018 - MongoDB & CosmosDB Migration Wizards
  5. Reading the NSA’s codebase (7):
    13 Aug 2018 - LemonGraph review–Part VII–Summary
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats