Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:


+972 52-548-6969

Posts: 7,017 | Comments: 49,691

filter by tags archive
time to read 4 min | 796 words

imageThere have been a couple of cases where I was working on a feature, usually a big and complex one, that made me realize that I’m just not smart enough to actually build it.

A long time ago (five or six years), I had to tackle free space handling inside of Voron. When an operation cause a page to be released, we need to register that page somewhere, so we’ll be able to reuse it later on. This doesn’t seem like too complex a feature, right? It is just a free list, what is the issue?

The problem was that the free list itself was managed as pages, and changes to the free list might cause pages to be allocated or de-allocated. This can happen while the free list operation is running. Basically, I had to make the free list code re-entrant. That was something that I tried to do for a long while, before just giving up. I wasn’t smart enough to handle that scenario properly. I could plug the leaks, but it was too complex and obvious that I’m going to run into additional cases where this would cause issues.

I had to use another strategy to handle this. Instead of allowing the free list to do dynamic allocations, I allocated the free list once, and then used a bitmap mode to store the data itself. That means that modifications to the free list couldn’t cause us to mutate the structure of the free list itself. The crisis was averted and the feature was saved.

I just checked, and the last meaningful change that happened for this piece of code was in Oct 2016, so it has been really stable for us.

The other feature where I realized I’m not smart enough to handle came up recently. I was working on the documents compression feature. One of the more important aspects of this feature is the ability to retrain, which means that when compressing a document, we’ll use a dictionary to reduce the size, and if the dictionary isn’t well suited, we’ll create a new one. The first implementation used a dictionary per file range. So all the documents that are placed on a particular location will use the same dictionary. That has high correlation to the documents written at the same time and had the benefit of ensuring that new updates to the same document will use its original dictionary. That is likely to result in good compression rate over time.

The problem was that during this process, we may find out that the dictionary isn’t suited for our needs and that we need to retrain. At this point, however, we were already computed the required space. But… the new dictionary may compress the data different (both higher and lower than what was expected). The fact that I could no longer rely on the size of the data during the save process lead to a lot of heartache. The issue was worse because we first try to compress the value using a specific dictionary, find that we can’t place it in the right location and need to put it in another place.

However, to find the new place, we need to be know what is the size that we need to allocate. And the new location may have a different dictionary, and there the compressed value is different…

Data may move inside RavenDB for a host of reasons, compaction, defrag, etc. Whenever we would move the data, we would need to recompress it, which led to a lot of complexity. I tried fighting that for a while, but it was obvious that I can’t manage the complexity.

I changed things around. Instead of having a dictionary per file range, I tagged the compressed value with a dictionary id. That way, each document could store the dictionary that it was using. That simplified the problem space greatly, because I only need to compress the data once, and afterward the size of the data remains the same. It meant that I had to keep a registry of the dictionaries, instead of placing a dictionary at the end of the specified file range, and it somewhat complicates recovery, but the overall system is ever so much simpler and it took a lot less effort and complexity to stabilize.

I’m not sure why just these problems shown themselves to be beyond my capabilities. But it does bring to mind a very important quote:

When I Wrote It, Only God and I Knew the Meaning; Now God Alone Knows

I also have to admit that I have had no idea that this quote predates code and computing. The earlier known reference for it is from 1826.

time to read 4 min | 774 words

Jeremy Miller has an interesting blog post about using advisory locks in Postgres to handle leader elections. This is a topic I spend a lot of time on, so I went over the post in detail. I don’t like this approach, because it has several subtle issues that are going to bite you down the road. All of them are relatively obscure, and all of them are going to happen in production in short order.

Go read the blog post, it explains the reasoning well. The core of the leader election is this:

The idea is that you have a process instance, that has a State() and a Start() methods. On multiple nodes, you are running this method, and it will coordinate using Postgres to ensure that there is only a single process that owns the lock at any given point in time. At least, that is the idea. In practice, there are issues.

Let’s assume that we are protecting a shared resource, such as a printer. We want to serialize access to the printer so two print jobs won’t get their pages mixed together. For simplicity, we’ll assume just two such nodes that compete on the lock.

On startup, one of the nodes will successfully get the lock, and the other will not, resulting in retries. So far, this is as expected.

I’m ignoring for now the lack of error handling, if we cannot start the connection, the whole thing is going to fail. This is sample code, so I’m pointing this out because the code must be resilient to such issues. We may bring up a node before the database is ready, and in this case, you’ll need to retry access the data.

A much more serious problem here is that we have a way for the process to signal that it is broken, but there is no way for the service to tell the process that it is no longer the leader. Let’s assume that a network issue has caused the connection to drop. The code, as written now, has no way of identifying this issue. It is actually worse than expected, because the connection isn’t actually being used. So even if the connection has dropped, the service is not aware of this. Even this, though, is something that can be fixed in a straightforward manner. You can add a cancellation token that the process will listen to.

You also need to keep verifying against the database server that you are still the owner of the lock and that the connection didn’t drop / fail and released it behind your back. And of course, there may be a delay between losing the lock and finding out about that.

That leads us to the most serious problem: Race conditions. In this case, even if the code handled all such scenarios nicely, we have to take into account the fact that we are dealing with separate resources here. In our example, we have Postgres for the leader election and the printer as the protected resource. The two nodes are competing on the lock, and then one of them starts printing. The lock is lost because of a network reset. At this point, Postgres frees the lock and the other node is able to lock it. It starts to run its own printing jobs.

Let’s say that the first node has a way to detect that it lost the lock. There is still the issue of how fast that can happen. It is very likely that at a certain point, you’ll have two nodes that believe that they are the leader. That is a Bad Thing.

A couple of years ago, GitHub was down for more than a day because of exactly this kind of a scenario. I analyzed the issue at the time in detail.

In this case, using the system above, you are pretty much guaranteed to have a messed up printing job, with pages from multiple jobs mixed together.

If you really care about consistency in the leader operations, you can’t just run things using a leader election. You have to run everything through the same mechanism. In GitHub’s example, they used Raft (a distributed consensus algorithm), but they used that to make decisions on a separate system, so there was a guarantee for inconsistency in that system.

In other words, you are either all in to distributed consensus or you should be out. Note that being out is fine, if you don’t care about short periods of multiple leaders. But if you need to ensure that this is the case, you cannot make it work without building it properly from the ground up.

time to read 3 min | 414 words

imageThe very first product of Hibernating Rhinos was a profiler for NHibernate, to allow you to figure out exactly what is going between your database and application. Now I’m proud to present our latest product: the Cosmos DB Profiler.

If you are using Azure, you are likely familiar with Cosmos DB. Cosmos DB is not a traditional relational database. It is marketed by Microsoft as a multi model database and it is widely known in the world of distributed databases. The first part is important enough to bear repeating. Cosmos DB is not a relational database, even if there is a tendency to treat it as such.

We have gathered everything we know about optimal database usage, mixed in all the experience we run into seeing users bump into issue working with distributed systems and then looked into all the best practices published about successful Cosmos DB applications. After we had all of that, we looked into patterns, things that we can do for you, automatically, that would prevent you from messing up. Thus, the Cosmos DB profiler was born.

Here is how it looks like, profiling an application locally:


As you can see, it give you context to the interaction between your application and the database. It allows you to see exactly what is going on behind the scenes. This is important, since most Cosmos DB applications aren’t trivial, we are usually talking about big applications with a lot of data and moving pieces. It can be hard to understand what is actually going on when you run a particular action.

Furthermore, the profiler is able to give you concrete suggestions that will improve your performance and reduce you cloud bills.


The pricing model for Cosmos DB is based on provisioned capacity, and it is very easy to get into a state where you need to provision a lot more than what you expected to need. The profiler is able to detect such issues, provide you with concrete recommendations on how to fix them and show you the savings, immediately.

I’m doing a webinar on the Cosmos DB profiler on Tuesday and I would love to see you there.

time to read 2 min | 340 words

imageWhen building a system with API Keys, you need to consider a seemingly trivial design question. Consider the two class diagram options on the right. We have a user’s account and we need to allow access to a certain resource. We do that using an API Key. Fairly simple and standard, to be honest.

We have a deceptively simple choice. We can have a single API key per user or multiple keys. If we go with multiple keys, we have to manage a list of them (track their use separately from the account, etc). If we have a single key, we can just threat API Key as a synonym to the account. If you are using a relational database, it is also the different from having a simple column to having a separate table that you’ll need to join to.

A single API Key is simpler to the developer building the service. It is not an uncommon choice, and it is also quite wrong for you to go that way.

Consider the case of key rotation. That is something that is generally recommended to do on a regular basis. If you have a single API Key for an account, how are you going to make the switch? Updating the single field will cause all the requests that use it to fail. But there is going to be some delay between setting the API Key  and being able to update all the services that uses it.

A model that allow you to have multiple API Keys is much easier. Add the new API Key to the service, update your systems to reflect the new API key (which you can do at your leisure, without the rush to update failing systems) and then remove the old API Key.

There are other reasons why you would want multiple API Keys, but the operational flexibility of being able to rotate them without breaking with world is key.

time to read 2 min | 287 words

When you have error code model for returning errors, you are going to be fairly limited in how you can report actual issues.

Here is a good example, taken from the ZStd source code:


You can see that the resulting error is the same, but we have three very different errors. Well, to be fair, we have two types of errors.

The total size is wrong and the number of samples is either too big or too small. There is no way to express that to the calling code, which may be far higher in the stack. There is just: “The source size is wrong” error.

There is actually an attempt at proper error reporting. The DISPLAYLEVEL is a way to report more information about the error, but like any C library, we are talking about creating custom error reporting. The DISPLAYLEVEL macro will write to the standard output if a flag is set. That flag is impossible to be set from outside the compilation unit, as far as I can see. So consuming this from managed code means that I have to just guess what these things are.

You can say a lot about the dangers and complexities of exceptions. But having a good way to report complex errors to the caller is very important. Note that in this case, complex means an arbitrary string generated at error time, not a complex object. An error code is great if you need to handle the error. But if you need to show it to the user, log it or handle it after the fact, a good clear error message is the key.

time to read 2 min | 303 words

Recently the time.gov site had a complete makeover, which I love. I don’t really have much to do with time in the US in the normal course of things, but this site has a really interesting feature that I love.

Here is what this shows on my machine:


I love this feature because it showcase a real world problem very easily. Time is hard. The concept we have in our head about time is completely wrong in many cases. And that leads to interesting bugs. In this case, the second machine will be adjusted on midnight from the network and the clock drift will be fixed (hopefully).

What will happen to any code that runs when this happens? As far as it is concerned, time will move back.

RavenDB has a feature, document expiration. You can set a time for a document to go away. We had a bug which caused us to read the entries to be deleted at time T and then delete the documents that are older than T. Expect that in this case, the T wasn’t the same. We travelled back in time (and the log was confusing) and go an earlier result. That meant that we removed the expiration entries but not their related documents. When the time moved forward enough again to have those documents expire, the expiration record was already gone.

As far as RavenDB was concerned, the documents were updated to expire in the future, so the expiration records were no longer relevant. And the documents never expired, ouch.

We fixed that by remembering the original time we read the expiration records. I’m comforted with knowing that we aren’t the only one having to deal with it.

time to read 6 min | 1077 words

I run across this article, which talks about unit testing. There isn’t anything there that would be ground breaking, but I run across this quote, and I felt that I have to write a post to answer it.

The goal of unit testing is to segregate each part of the program and test that the individual parts are working correctly. It isolates the smallest piece of testable software from the remainder of the code and determines whether it behaves exactly as you expect.

This is a fairly common talking point when people discuss unit testing. Note that this isn’t the goal. The goal is what you what to achieve, this is a method of applying unit testing. Some of the benefits of unit test, are:

Makes the Process Agile and Facilitates Changes and Simplifies Integration

There are other items in the list on the article, but you can just read it there. I want to focus right now on the items above, because they are directly contradicted by separating each part of the program and testing it individually, as is usually applied in software projects.

Here are a few examples from posts I wrote over the years. The common pattern is that you’ll have interfaces, and repositories and services and abstractions galore. That will allow you to test just a small piece of your code, separate from everything else that you have.

This is great for unit testing. But unit testing isn’t a goal in itself. The point is to enable change down the line, to ensure that we aren’t breaking things that used to work, etc.

An interesting thing happens when you have this kind of architecture (and especially if you have this specifically so you can unit test it): it becomes very hard to make changes to the system. That is because the number of times you repeated yourself has grown. You have something once in the code and a second time in the tests.

Let’s consider something rather trivial. We have the following operation in our system, sending money:


A business rule says that we can’t send money if we don’t have enough in our account. Let’s see how we may implement it:

This seems reasonable at first glance. We have a lot of rules around money transfer, and we expect to have more in these in the future, so we created the IMoneyTransferValidationRules abstraction to model that and we can easily add new rules as time goes by. Nothing objectionable about that, right? And this is important, so we’ll have unit tests for each one of those rules.

During the last stages of the system, we realize that each one of those rules generate a bunch of queries to the database and that when we have load on the system, the transfer operation will create too much pain as it currently stand. There are a few options that we have available at this point:

  • Instead of running individual operations that will each load their data, we’ll do it once for every one. Here is how this will look like:

As you can see, we now have a way to use Lazy queries to reduce the number of remote calls this will generate.

  • Instead of taking the data from the database and checking it, we’ll send the check script to the database and do the validation there.

And here we moved pretty much the same overall architecture directly into the database itself. So we’ll not have to pay the cost of remote calls when we need to access more information.

The common thing for both approach is that it is perfectly in line with the old way of doing things. We aren’t talking about a major conceptual change. We just changed things so that it is easier to work with properly.

What about the tests?

If we tested each one of the rules independently, we now have a problem. All of those tests will now require non trivial modification. That means that instead of allowing change, the tests now serve as a barrier for change. They have set our architecture and code in concrete and make it harder to make changes.  If those changes were bugs, that would be great. But in this case, we don’t want to modify the system behavior, only how it achieve its end result.

The key issue with unit testing the system as a set of individually separated components is that concept that there is value in each component independently. There isn’t. The whole is greater than the sum of its parts is very much in play here.

If we had tests that looked at the system as a whole, those wouldn’t break. They would continue to serve us properly and validate that this big change we made didn’t break anything. Furthermore, at the edges of the system, changing the way things are happening usually is a problem. We might have external clients or additional APIs that rely on us, after all. So changing the exterior is something that I want to enforce with tests.

That said, when you build your testing strategy, you may have to make allowances. It is very important for the tests to run as quickly as possible. Slow feedback cycles can be incredibly annoying and will kill productivity. If there are specific components in your system that are slow, it make sense to insert seams to replace them. For a example, if you have a certificate generation bit in your system (which can take a long time) in the tests, you might want to return a certificate that was prepared ahead of time. Or if you are working with a remote database, you may want to use an in memory version of that. An external API you’ll want to mock, etc.

The key here isn’t that you are trying to look at things in isolation, the key is that you are trying to isolate things that are preventing you from getting quick feedback on the state of the system.

In short, unless there is uncertainty about a particular component (implementing new algorithm or data structure, exploring unfamiliar library, using 3rd party code, etc), I wouldn’t worry about testing that in isolation. Test it from outside, as a user would (note that this may take some work to enable that as an option) and you’ll end up with a far more robust testing infrastructure.

time to read 6 min | 1149 words

I recently got into an interesting discussion about one of the most interesting features of RavenDB, the ability to automatically deduce and create indexes on the fly, based on actual queries made to the server. This is a feature that RavenDB had for a very long time, over a decade and one that I’m quite proud of. The discussion was about whatever such a feature was useful or not in real world scenario. I obviously leant on this being incredibly useful, but I found the discussion good enough to post it here.

The gist of the argument against automatic indexes is that the developers should be in control of what is going on in the database and create the appropriate indexes on their own accord. The database should be not be in the business of creating indexes on the fly, which is scary to do in production.

I don’t like the line of thinking that says that it is the responsibility of the developers / operators / database admins to make sure that all queries use the optimal query plans. To be rather more exact, I absolutely think that they should do that, I just don’t believe that they can / will / are able to.

In many respects, I consider the notion of automatic index creation to be similar to garbage collection in managed languages. There is currently one popular language that still does manual memory management, and that is C. Pretty much all other languages have switched to some other mode that mean that the developer don’t need to track things manually. Managed languages has a GC, Rust has its ownership model, C++ has RAII and smart pointers, etc. We have decades and decades of experience telling us that no, developers actually can’t be expected to keep track of memory properly. There is a direct and immediate need for systematic help for that purpose.

Manual memory management can be boiled down to: “for every malloc(), call free()”. And yet it doesn’t work.

For database optimizations, you need to have a lot knowledge. You need to understand the system, the actual queries being generated, how the data is laid out on disk and many other factors. The SQL Server Query Performance Tuning book is close to a thousand pages in length. So that is decidedly not a trivial topic.

It is entirely possible to expect experts to know the material and have a checkpoint to deployment that would ensure that you have done the Right Thing before deploying to production. Expect that this is specialized knowledge, so now you have gate keepers, and going back to manual memory management woes, we know that this doesn’t always work.

There is a cost / benefit calculation here. If we make it too hard for developers to deploy, the pace of work would slow down. On the other hand, if a bad query goes to production, it may take the entire system down.

In some companies, I have seen weekly meetings for all changes to the database. You submit your changes (schema or queries), it get reviewed in the next available meeting and deploy to production within two weeks of that. The system was considered to be highly efficient in ensuring nothing bad happened to the database. It also ensured that developers would cut corners. In a memorable case, a developer needed to show some related data on a page. Doing a join to get the data would take three weeks. Issuing database calls over the approved API, on the other hand, could be done immediately. You can guess how that ended up, don’t you?

RavenDB has automatic indexes because they are useful. As you build your application, RavenDB learn from the actual production behavior. The more you use a particular aspect, the more RavenDB is able to optimize it. When changes happen, RavenDB is able to adjust, automatically. That is key, because it remove a very tedious and time consuming chore from the developers. Instead of having to spend a couple of weeks before each deployment verifying that the current database structure still serve for the current set of queries, they can rest assured that the database will handle that.

In fact, RavenDB has a mode where you can run your system on a test database and take the information gather from the test run and apply it on your production system. That way, you can avoid having to learn the new behavior on the fly. You can introduce the new changes to the system model at an idle point in time and let RavenDB adjust to it without anything much going on.

I believe that much of the objection for automatic indexes comes from the usual pain involved in creating indexes in other databases. Creating an index is often seen as a huge risk. It may lock tables and pages, it may consume a lot of system resources and even if the systems has an online index creation mode (and not all do), it is something that you Just Don’t do.

RavenDB, in contrast, has been running with this feature for a decade. We have had a lot of time to improve the heuristics and behavior of the system under this condition. New indexes being introduced are going to have bounded resources allocated to them, no locks are involved and other indexes are able to server requests with no interruption in service. RavenDB is also able to go the other way, it will recognize which automatic indexes are superfluous and remove them. And automatic indexes that see no use will be expired by the query optimizer for you. The whole idea is that there is an automated DBA running inside the RavenDB Query Optimizer that will constant monitor what is going on, reducing the need for manual maintenance cycles.

As you can imagine, this is a pretty important feature and has been through a lot of optimization and work over the years. RavenDB is now usually good enough in this task that in many cases, you don’t ever need to create indexes yourself. That has enormous impact on the ability to rapidly evolve your product. Because you are able to do that instead of going over a thousand pages book telling you how to optimize your queries. Write the code, issue your queries, and the database will adjust.

Will all those praises that I heap upon automatic index creation, I want to note that it is a most a copper bullet, not a silver one. Just like with garbage collection, you are free from the minutia and tedium of manual memory management, but you still need to understand some of the system behavior. The good thing about this is that you are free()-ed  from having to deal with that all the time. You just need to pay attention in rare cases, usually at the hotspots of your application. That is a much better way to invest your time.

time to read 7 min | 1291 words

A system that runs on a single machine is an order of magnitude simpler than one that reside on multiple machines. The complexity involved in maintaining consistency across multiple machines is huge. I have been dealing with this for the past 15 years and I can confidently tell you that no sane person would go for multi machine setup in favor of a single machine if they can get away with it. So what was the root cause for the push toward multiple machines and distributed architecture across the board for the past 20 years? And why are we see a backlash against that today?

You’ll hear people talking about the need for high availability and the desire to avoid a single point of failure. And that is true, to a degree. But there are other ways to handle that (primary / secondary model) rather than the full blown multi node setup.

Some users simply have too much data to go around and have to make use of a distributed architecture. If you are gathering a TB / day of data, no single system is going to help you, for example. However, most users aren’t there. A growth rate of GB / day (fully retained) is quite respectable and will take over a decade to start becoming a problem on a single machine.

What I think people don’t remember so well is that the landscape has changed quite significantly in the past 10 – 15 years. I’m not talking about Moore’s law, I’m talking about something that is actually quite a bit more significant. The dramatic boost in speed that we saw in storage.

Here are some numbers from the start of the last decade a top of the line 32GB SCSI drive with 15K RPM could hit 161 IOPS. Looking at something more modern disk with 14 TB will have 938 IOPS. That is a speed increase of over 500%, which is amazing, but not enough to matter. These two numbers are from hard disks. But we have had a major disruption in storage at the start of the millennium. The advent of SSD drives.

It turns out that SSDs aren’t nearly as new as one would expect them. They were just horribly expensive. Here are the specs for such a drive around 2003. The cost would be tens of thousands (USD) per drive. To be fair, this was meant to be used in rugged environment (think military tech, missiles and such), but there wasn’t much else in the market. In 2003 the first new commodity SSD started to appear, with sizes that topped at 512MB.

All of this is to say, in the early 2000s, if you wanted to store non trivial amount of data, you had to face the fact that you had to deal with hard disks. And you could expect some pretty harsh limitations on the number of IOPS available. And that, in turn, meant that the deciding factor for scale out wasn’t really the processing speed. Remember that the C10K problem was still a challenge, but reasonable one, in 1999. That is, handling 10K concurrent connections on a single server (to compare, millions of connections per server isn’t out of the ordinary).

Given 10K connections per server, with each one of them needing a single IO per 5 seconds, what would happen? That means that we need to handle 2,000 IOPS. That is over ten times what you can get from a top of the line disk at that time. So even if you had a RAID0 with ten drives and was able to get perfect distribution of IO to drive, you would still be about 20% short. And I don’t think you’ll want to get 10 drives at RAID0 in production. Given the I/O limits, you could reasonably expect to serve 100 – 300 operations per second per server. And that is assuming that you were able to cache some portion of the data in memory and avoid disk hits. The only way to handle this kind of limitation was to scale out, to have more servers (and more disks) to handle the load.

The rise of commodity SSDs changed the picture significantly and NVMe drives are the icing on the cake. SSD can do tens of thousands of IOPS and NVMe can do hundreds of thousands IOPS (and some exceed the million IOPS with comfortable margin).

Going back to the C10K issue? A 49.99$ drive with 256GB has specs that exceed 90,000 IOPS. Those 2000 IOPS we had to get 10 machines for? That isn’t really noticeable at all today. In fact, let’s go higher. Let’s say we have 50,000 concurrent connections each one issuing an operation once a second. This is a hundred times more work than the previous example. But the environment in which we are running is very different.

Given an operating budget of 150$, I will use the hardware from this article, which is basically a Raspberry PI with SSD drive (and fully 50$ of the budget go for the adapter to plug the SSD to the PI). That gives me the ability to handle 4,412 requests per second using Apache, which isn’t the best server in the world. Given that the disk used in the article can handle more than 250,000 IOPS, we can run a lot on a “server” that fits into a lunch box and cost significantly less than the monthly cost of a similar machine on the cloud. This factor totally change the way you would architect your systems.

The much better hardware means that you can select a simpler architecture and avoid a lot of complexities along the way. Although… we need to talk about the cloud, where the old costs are still very much a factor.

Using AWS as the baseline, I can get a 250GB gp2 SSD drive for 25$ / month. That would give me 750 IOPS for 25$. That is nice, I guess, but it puts me at less than what I can get from a modern HDD today. There is the burst capability on the cloud, which can smooth out some spikes, but I’ll ignore that for now. Let’s say that I wanted to get higher speed, I can increase the disk size (and hence the IOPS) at linear rate. The max I can get from gp2 is 16,000 IOPS at a cost of 533$.  Moving to io1 SSD, we can get 500GB drive with 3,000 IOPS for 257$ per month, and exceeding 20,000 IOPS on a 1TB drive would cost 1,425$.

In contrast, 242$ / month will get us a r5ad.2xlarge machine with 8 cores, 64 GB and 300 GB NVMe drive. A 1,453$ will get us a r5ad.12xlarge with 48 cores, 384 GB and 1.8TB NVMe drive. You are better off upgrading the machine entirely and running on top of the local NVMe drive and handling the persistency yourself than paying the storage costs associated with having it out as a block storage.

This tyranny of I/O costs and performance has had a huge impact on the overall system architecture of many systems. Scale out was not, as usually discussed, a reaction to the limits of handling the number of users. It was a limit on how fast the I/O systems could handle concurrent load. With SSD and NVMe drives, we are in a totally different field and need to consider how that affect our systems.

In many cases, you’ll want to have just enough data distribution to ensure high availability. Otherwise, it make more sense to get larger but fewer machines. The reduction in management overhead alone is usually worth it, but the key aspect is reducing the number of moving pieces in your architecture.

time to read 3 min | 462 words

imageWe are exposing an API to external users at the moment. This API is currently exposed only via a web app. The internal architecture is that we have an controller using ASP.Net MVC that will accept the request from the browser, validate the data and then put the relevant commands in a queue for backend processing.

The initial version of the public API is going to be 1:1 identical to the version we have in the web application. That leads to an obvious question. How do we avoid code duplication between the two?

Practically speaking, there are some differences between the two versions:

  • The authentication is different (session cookie vs. api key)
  • The base class for the controller is different

Aside from those changes, they are remarkedly the same. So how should you share the code?

My answer, in this case, was that the proper strategy was Ctrl+C and Ctrl+V. I do not want to share code between those two locations. In this case, code duplication is the appropriate strategy for ensuring maintainable code.

Here is my reasoning:

  • This is the entry point to our system, not where the real work is done. The controller is in charge of validation, creating the command to run and sending it to the backend. The real heavy lifting happens elsewhere and will be common regardless.
  • That doesn’t mean that the controller doesn’t have work to do, mind. However, that work is highly dependent on the interaction model. In other words, it depends on what we show to the user in the web application. The web application and the existing controller are deployed in tandem and have no versioning boundary between them. Changes to the UI (adding a new field, for example, or additional behaviors) will be reflected in the controller.
  • The API, on the other hand, is a release once maintain for a long time. We don’t want any unforeseen changes here. Changes to the web controller code should not have an impact on the API behavior.

In other words, the fact that we have value equality doesn’t mean that we have identity equality. The code is the same right now, sure. And would be fairly easy to abstract so we’ll only have it in a single location. But that would create a maintenance burden. Whenever we make a modification in the web portion, we’ll have to remember to keep the old behavior for the API. Or we won’t remember, and create backward compatibility issues.

Creating separate codebases for ease use case result in a decision we make once, and then the most straightforward approach for writing the code will give us the behavior we want.

You might also want to read about the JFHCI approach.


  1. Optimizing RavenDB by adding Thread.Sleep(5) - 3 hours from now
  2. Complex distributed transactions with RavenDB - about one day from now

There are posts all the way to May 27, 2020


  1. Webinar recording (4):
    25 May 2020 - Event sourcing and RavenDB
  2. Talk (5):
    23 Apr 2020 - Advanced indexing with RavenDB
  3. Challenge (57):
    21 Apr 2020 - Generate matching shard id–answer
  4. Production postmortem (29):
    23 Mar 2020 - high CPU when there is little work to be done
  5. RavenDB 5.0 (3):
    20 Mar 2020 - Optimizing date range queries
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats