Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:


+972 52-548-6969

Posts: 6,992 | Comments: 49,637

filter by tags archive
time to read 2 min | 303 words

Recently the time.gov site had a complete makeover, which I love. I don’t really have much to do with time in the US in the normal course of things, but this site has a really interesting feature that I love.

Here is what this shows on my machine:


I love this feature because it showcase a real world problem very easily. Time is hard. The concept we have in our head about time is completely wrong in many cases. And that leads to interesting bugs. In this case, the second machine will be adjusted on midnight from the network and the clock drift will be fixed (hopefully).

What will happen to any code that runs when this happens? As far as it is concerned, time will move back.

RavenDB has a feature, document expiration. You can set a time for a document to go away. We had a bug which caused us to read the entries to be deleted at time T and then delete the documents that are older than T. Expect that in this case, the T wasn’t the same. We travelled back in time (and the log was confusing) and go an earlier result. That meant that we removed the expiration entries but not their related documents. When the time moved forward enough again to have those documents expire, the expiration record was already gone.

As far as RavenDB was concerned, the documents were updated to expire in the future, so the expiration records were no longer relevant. And the documents never expired, ouch.

We fixed that by remembering the original time we read the expiration records. I’m comforted with knowing that we aren’t the only one having to deal with it.

time to read 7 min | 1291 words

A system that runs on a single machine is an order of magnitude simpler than one that reside on multiple machines. The complexity involved in maintaining consistency across multiple machines is huge. I have been dealing with this for the past 15 years and I can confidently tell you that no sane person would go for multi machine setup in favor of a single machine if they can get away with it. So what was the root cause for the push toward multiple machines and distributed architecture across the board for the past 20 years? And why are we see a backlash against that today?

You’ll hear people talking about the need for high availability and the desire to avoid a single point of failure. And that is true, to a degree. But there are other ways to handle that (primary / secondary model) rather than the full blown multi node setup.

Some users simply have too much data to go around and have to make use of a distributed architecture. If you are gathering a TB / day of data, no single system is going to help you, for example. However, most users aren’t there. A growth rate of GB / day (fully retained) is quite respectable and will take over a decade to start becoming a problem on a single machine.

What I think people don’t remember so well is that the landscape has changed quite significantly in the past 10 – 15 years. I’m not talking about Moore’s law, I’m talking about something that is actually quite a bit more significant. The dramatic boost in speed that we saw in storage.

Here are some numbers from the start of the last decade a top of the line 32GB SCSI drive with 15K RPM could hit 161 IOPS. Looking at something more modern disk with 14 TB will have 938 IOPS. That is a speed increase of over 500%, which is amazing, but not enough to matter. These two numbers are from hard disks. But we have had a major disruption in storage at the start of the millennium. The advent of SSD drives.

It turns out that SSDs aren’t nearly as new as one would expect them. They were just horribly expensive. Here are the specs for such a drive around 2003. The cost would be tens of thousands (USD) per drive. To be fair, this was meant to be used in rugged environment (think military tech, missiles and such), but there wasn’t much else in the market. In 2003 the first new commodity SSD started to appear, with sizes that topped at 512MB.

All of this is to say, in the early 2000s, if you wanted to store non trivial amount of data, you had to face the fact that you had to deal with hard disks. And you could expect some pretty harsh limitations on the number of IOPS available. And that, in turn, meant that the deciding factor for scale out wasn’t really the processing speed. Remember that the C10K problem was still a challenge, but reasonable one, in 1999. That is, handling 10K concurrent connections on a single server (to compare, millions of connections per server isn’t out of the ordinary).

Given 10K connections per server, with each one of them needing a single IO per 5 seconds, what would happen? That means that we need to handle 2,000 IOPS. That is over ten times what you can get from a top of the line disk at that time. So even if you had a RAID0 with ten drives and was able to get perfect distribution of IO to drive, you would still be about 20% short. And I don’t think you’ll want to get 10 drives at RAID0 in production. Given the I/O limits, you could reasonably expect to serve 100 – 300 operations per second per server. And that is assuming that you were able to cache some portion of the data in memory and avoid disk hits. The only way to handle this kind of limitation was to scale out, to have more servers (and more disks) to handle the load.

The rise of commodity SSDs changed the picture significantly and NVMe drives are the icing on the cake. SSD can do tens of thousands of IOPS and NVMe can do hundreds of thousands IOPS (and some exceed the million IOPS with comfortable margin).

Going back to the C10K issue? A 49.99$ drive with 256GB has specs that exceed 90,000 IOPS. Those 2000 IOPS we had to get 10 machines for? That isn’t really noticeable at all today. In fact, let’s go higher. Let’s say we have 50,000 concurrent connections each one issuing an operation once a second. This is a hundred times more work than the previous example. But the environment in which we are running is very different.

Given an operating budget of 150$, I will use the hardware from this article, which is basically a Raspberry PI with SSD drive (and fully 50$ of the budget go for the adapter to plug the SSD to the PI). That gives me the ability to handle 4,412 requests per second using Apache, which isn’t the best server in the world. Given that the disk used in the article can handle more than 250,000 IOPS, we can run a lot on a “server” that fits into a lunch box and cost significantly less than the monthly cost of a similar machine on the cloud. This factor totally change the way you would architect your systems.

The much better hardware means that you can select a simpler architecture and avoid a lot of complexities along the way. Although… we need to talk about the cloud, where the old costs are still very much a factor.

Using AWS as the baseline, I can get a 250GB gp2 SSD drive for 25$ / month. That would give me 750 IOPS for 25$. That is nice, I guess, but it puts me at less than what I can get from a modern HDD today. There is the burst capability on the cloud, which can smooth out some spikes, but I’ll ignore that for now. Let’s say that I wanted to get higher speed, I can increase the disk size (and hence the IOPS) at linear rate. The max I can get from gp2 is 16,000 IOPS at a cost of 533$.  Moving to io1 SSD, we can get 500GB drive with 3,000 IOPS for 257$ per month, and exceeding 20,000 IOPS on a 1TB drive would cost 1,425$.

In contrast, 242$ / month will get us a r5ad.2xlarge machine with 8 cores, 64 GB and 300 GB NVMe drive. A 1,453$ will get us a r5ad.12xlarge with 48 cores, 384 GB and 1.8TB NVMe drive. You are better off upgrading the machine entirely and running on top of the local NVMe drive and handling the persistency yourself than paying the storage costs associated with having it out as a block storage.

This tyranny of I/O costs and performance has had a huge impact on the overall system architecture of many systems. Scale out was not, as usually discussed, a reaction to the limits of handling the number of users. It was a limit on how fast the I/O systems could handle concurrent load. With SSD and NVMe drives, we are in a totally different field and need to consider how that affect our systems.

In many cases, you’ll want to have just enough data distribution to ensure high availability. Otherwise, it make more sense to get larger but fewer machines. The reduction in management overhead alone is usually worth it, but the key aspect is reducing the number of moving pieces in your architecture.

time to read 2 min | 281 words

Subscriptions in RavenDB gives you a great way to handle backend business processing. You can register a query and get notified whenever a document that matches your query is changed. This works if the document actually exists, but what happens if you want to handle a business process relating to document’s deletion ?

I want to explicitly call out that I’m generally against deletion. There are very few business cases for it. But sometimes you got to (GDPR comes to mind) or you have an actual business reason for this.

A key property of deletion is that the data is gone, so how can you process deletions? A subscription will let you know when a document changes, but not when it is gone. Luckily, there is a nice way to handle this. First, you need to enable revisions on the collection in question, like so:


At this point, RavenDB will create revisions for all changed documents, and a revision is created for deletions as well. You can see deleted documents in the Revisions Bin in the Studio, to track deleted documents.


But how does this work with Subscriptions? If you’ll try to run a subscription query at this point, you’ll not find this employee. For that, you have to use versioned subscription, like so:


And now you can subscribe to get notified whenever an employee is deleted.

time to read 2 min | 388 words

I recently had what amounted to a drive by code review. I was looking into code that wasn’t committed or PR. Code that might not have been even saved to disk at the time that I saw it. I saw that while working with the developer on something completely different. And yet even a glace was enough to cause me to pause and make sure that this code will be significantly changed before it ever move forward. The code in question is here:

What is bad about this code? No, it isn’t the missing ConfigureAwait(false), in that scenario we don’t need it. The problem is in the very first line of code.

This is meant to be public API. It will have consumers from outside our team. That means that the very first thing that we need to ensure is that we don’t expose our own domain model to the outside world.

There are multiple reasons for this. To start with, versioning is a concern. Sure, we have the /v1/  in the route, but there is nothing here that would make break if we changed our domain model in a way that a third party client relies on. We have a compiler, we really want to be able to use it.

The second issue, which I consider more important, is that this leaks information that I may not really want to share. By exposing my full domain model to the outside world, I risk quite a bit. For example, I may have internal notes on the support ticket which I don’t want to expose to the public. Any field that I expose to the outside world is a compatibility concern, but any field that I add is a problem as well. This is especially true if I assume that those fields are private.

The fix is something like this:

Note that I have class that explicitly define the shape that I’m giving to the outside world. I also manually map between the internal and external fields. Doing something like auto mapper is not something that I want, because I want all of those decisions to be made explicitly. In particular, I want to be sure that every single field that I share with the outside world is done in such a way that it is visible during PR reviews.

time to read 2 min | 260 words

These are not the droids you are looking for! – Obi-Wan Kenobi

Sometimes you need to find a set of documents not because of their own properties, but based on a related document. A good example may be needing to find all employees that blue Nissan car. Here is the actual model:


In SQL, we’ll want a query that goes like this:

This is something that you cannot express directly in RavenDB or RQL. Luckily, you aren’t going to be stuck, RavenDB has a couple of options for this. The first, and the most closely related to the SQL option is to use a graph query. That is how you will typically query over relationships in RavenDB. Here is what this looks like:

Of course, if you have a lot of matches here, you will probably want to do things in a more efficient manner. RavenDB allows you to do so using indexes. Here is what the index looks like:

The advantage here is that you can now query on the index in a very simple manner:

RavenDB will ensure that you get the right results, and changing the Car’s color will automatically update the index’s value.

The choice between these two comes down to frequency of change and how large the work is expected to be. The index favors more upfront work for faster query times while the graph query option is more flexible but requires RavenDB to do more on each query.

time to read 3 min | 568 words

Compression is a nice way to trade off time for space. Sometimes, this is something desirable, especially as you get to the higher tiers of data storage. If your data is mostly archived, you can get significant savings in storage in trade for a bit more CPU. This perfectly reasonable desire create somewhat of a problem for RavenDB, we have competing needs here. On the one hand, you want to compress a lot of documents together, to benefit for duplications between documents. On the other hand, we absolutely must be able to load a single document as fast as possible. That means that just taking 100MB of documents and compressing them in a naïve manner is not going  to work, even if this is going to result in great compression ratio. I have been looking at zstd recently to help solve this issue. 

The key feature for zstd is the ability to train the model on some of the data, and then reuse the resulting dictionary to greatly increase the compression ratio.

Here is the overall idea. Given a set of documents (10MB or so) that we want to compress, we’ll train zstd on the first 16 documents and then reuse the dictionary to compress each of the documents individually. I have used a set of 52MB of JSON documents as the test data. They represent restaurants critics, I think, but I intentionally don’t really care about the data.

Raw data: 52.3 MB. Compressing it all with 7z gives us 1.08 MB. But that means that there is no way to access a single document without decompressing the whole thing.

Using zstd with the compression level of 3, I was able to compress the data to 1.8MB in 118 milliseconds. Choosing compression level 100 reduced the size to 1.02MB but took over 6 seconds to run.

Using zstd on each document independently, where each document is under 1.5 KB in size gave me a total reducing from to 6.8 MB. This is without the dictionary. And the compression took 97 milliseconds.

With a dictionary whose size was set to 64 KB, computed from the first 128 documents gave me a total size of 4.9 MB and took 115 milliseconds.

I should note that the runtime of the compression is variable enough that I’m pretty much going to call all of them the same.

I decided to use this on a different dataset and run this over the current senators dataset. Total data size is 563KB and compressing it as a single unit would give us 54kb. Compressing as individual values, on the other hand, gave us a 324 kb.

When training zstd on the first 16 documents with 4 KB of dictionary to generate we got things down to 105 kb.

I still need to mull over the results, but I find them really quite interesting. Using a dictionary will complicate things, because the time to build the dictionary is non trivial. It can take twice as long to build the dictionary as it would be to compress the data. For example, 16 documents with 4 kb dictionary take 233 milliseconds to build, but only take 138 milliseconds to compress 52 MB. It is also possible for the dictionary to make the compression rate worse, so that is fun.

Any other idea on how we can get both the space savings and the random access option would be greatly appreciated.

time to read 2 min | 311 words

RavenDB always had optimistic concurrency, I consider this to be an important feature for building correct distributed and concurrent systems. However, RavenDB doesn’t implement pessimistic locking. At least, not explicitly. It turns out that we have all the components in place to support it. If you want to read more about what pessimistic locking actually is, this Stack Overflow answer has good coverage of the topic.

There are two types of pessimistic locking. Offline and online locking. In the online mode, the database server will take an actual lock when modifying a record. That model works for a conversation pattern with the database. Where you open a transaction and hold it open while you mutate the data. In today’s world, where most processing is handled using request / response  (REST, RPC, etc), that kind of interaction is rare. Instead, you’ll typically want to use offline pessimistic lock. That is, a lock that can live longer than a single transaction. With RavenDB, we build this feature on top of the usual optimistic concurrency as well as the document expiration feature.

Let’s take the classic example of pessimistic locking. Reserving seats for a show. Once you have selected a seat, you have 15 minutes to complete the order, otherwise the seats will automatically be released. Here is the code to do this:

The key here is that we rely on the @expires feature to remove the seatLock document automatically. We use a well known document id to coordinate concurrent requests that try to get the same seat. The rest is just the usual RavenDB’s optimistic concurrency behavior.

You have 15 minutes before the expiration and then it goes poof. From the point of view of implementing this feature, you’ll spend most of your time writing the edge cases, because from the point of view of RavenDB, there is really not much here at all.

time to read 3 min | 568 words

We are now working on proper modeling scenarios for RavenDB’s time series as part of our release cycle. We are trying to consider as many possible scenarios and see that we have good answer to them. As part of this, we looked at applying timeseries in RavenDB to problems that were raised by customers in the past.

The scenario in question is storing data from a traffic camera. The idea is that we have a traffic camera that will report [Time, Car License Number, Speed] for each car that it capture. The camera will report all cars, not just those that are speeding. Obviously, we don’t want to store a document for each and every car registered by the camera. At the same time, we are interested in knowing the speed on the roads over time.

There for, we are going to handle this in the following manner:

This allows us to handle both the ticket issuance and recording the traffic on the road over time. This works great, but it does leave one thing undone. How do I correlate the measurement to the ticket?

In this case, let’s assume that I have some additional information about the measurement that I record in the time series (for example, the confidence level of the camera in its speed report) and that I need to be able to go from the ticket to the actual measurement and vice versa.

The question is how to do this? The whole point of time series is that we are able to compress the data we record significantly. We use about 4 bits per entry, and that is before we apply actual compression here. That means that if we want to be able to use the minimal amount of disk space, we need to consider how to do this.

One way of handling this is to first create the ticket and attach the Ticket’s Id to the measurement. That is where the tag on the entry comes into play. This works, but it isn’t ideal. The idea about the tag on the entry is that we expect there to be a lot of common values. For example, if we have a camera that uses two separate sensors, we’ll use the tag to denote which sensor took the measurement. Or maybe it will use the make & model of the sensor, etc. The set of values for the tag is expected to be small and to highly repeat itself. If the number of tickets issued is very small, of course, we probably wouldn’t mind. But let’s assume that we can’t make that determination.

So we need to correlate the measurement to the ticket, and the simplest way to handle that is to record the time of the measurement in the ticket, as well as which camera generated the report. With this information, you can load the relevant measurement easily enough. But there is one thing to consider. RavenDB’s timestamps use millisecond accuracy, while .NET’s DateTime has 100 nanosecond accuracy. You’ll need to account for that when you store the value.

With that in place, you can do all sort of interesting things. For example, consider the following query.

This will allow us to show the ticket as well as the road conditions around the time of the ticket. You can use it to say “but everyone does it”, which I am assured is a valid legal defense strategy.

time to read 7 min | 1291 words

I talked about finding a major issue with ThreadLocal and the impact that it had on long lived and large scale production environments. I’m not sure why ThreadLocal<T> is implemented the way it does, but it seems to me that it was never meant to be used with tens of thousands of instances and thousands of threads. Even then, it seems like the GC pauses issue is something that you wouldn’t expect to see by just reading the code. So we had to do better, and this gives me a chance to do something relatively rare. To talk about a complete feature implementation in detail. I don’t usually get to do this, features are usually far too big for me to talk about in real detail.

I’m also interested in feedback on this post. I usually break them into multiple posts in a series, but I wanted to try putting it all in one location. The downside is that it may be too long / detailed for someone to read in one seating. Please let me know your thinking in the matter, it would be very helpful.

Before we get started, let’s talk about the operating environment and what we are trying to achieve:

  1. Running on .NET core.
  2. Need to support tens of thousands of instances (I don’t like it, but fixing that issue is going to take a lot longer).
  3. No shared state between instances.
  4. Cost of the ThreadLocal is related to the number of thread values it has, nothing else.
  5. Should automatically clean up after itself when a thread is closed.
  6. Should automatically clean up after itself when a ThreadLocal instance is disposed.
  7. Can access all the values across all threads.
  8. Play nicely with the GC.

That is quite a list, I have to admit. There are a lot of separate concerns that we have to take into account, but the implementation turned out to be relatively small. First, let’s show the code, and then we can discuss how it answer the requirements.

This shows the LightThreadLocal<T> class, but it is missing the CurrentThreadState, which we’ll discuss in a bit. In terms of the data model, we have a concurrent dictionary, which is indexed by a CurrentThreadState instance which is held in a thread static variable. The code also allows you to define a generator and will create a default value on first access to the thread.

The first design decision is the key for the dictionary, I thought about using Thread.CurrentThread and the thread id.Using the thread id as the key is dangerous, because thread ids may be reused. And that is a case of a nasty^nasty bug. Yes, that is a nasty bug raised to the power of nasty. I can just imagine trying to debug something like that, it would be a nightmare.  As for using Thread.CurrentThread, we’ll not have reused instances, so that is fine, but we do need to keep track of additional information for our purposes, so we can’t just reuse the thread instance. Therefor, we created our own class to keep track of the state.

All instances of a LightThreadLocal are going to share the same thread static value. However, that value is going to be kept as small as possible, it’s only purpose is to allow us to index into the shared dictionary. This means that except for the shared thread static state, we have no interaction between different instances of the LightThreadLocal. That means that if we have a lot of such instances, we use a lot less space and won’t degrade performance over time.

I also implemented an explicit disposal of the values if needed, as well as a finalizer. There is some song and dance around the disposal to make sure it plays nicely with concurrent disposal from a thread (see later), but that is pretty much it.

There really isn’t much to do here, right? Except that the real magic happens in the CurrentThreadState.

Not that much magic, huh? Smile

We keep a list of the LightThreadLocal instance that has registered a value for this thread. And we have a finalizer that will be called once the thread is killed. That will go to all the LightThreadLocal instances that used this thread and remove the values registered for this thread. Note that this may run concurrently with the LightThreadLocal.Dispose, so we have to be a bit careful (the careful bit happens in the LightThreadLocal.Dispose).

There is one thing here that deserve attention, though. The WeakReferenceToLightThreadLocal class, here it is with all its glory:

This is basically wrapper to WeakReference that allow us to get a stable hash value even if the reference has been collected. The reason we use that is that we need to reference the LightThreadLocal from the CurrentThreadState. And if we hold a strong reference, that would prevent the LightThreadLocal instance from being collected. It also means that in terms of the complexity of the object graph, we have only forward references with no cycles, cross references, etc. That should be a fairly simple object graph for the GC to walk through, which is the whole point of what I’m trying to do here.

Oh, we also need to support accessing all the values, but that is so trivial I don’t think I need to talk about it. Each LightThreadLocal has its own concurrent dictionary, and we can just access that Values property and we get the right result.

We aren’t done yet, though. There are still certain things that I didn’t do. For example, if we have a lot of LightThreadLocalinstances, they would gather up in the thread static instances, leading to large memory usage. We want to be able to automatically clean these up when the LightThreadLocalinstance goes away. That turn out to be somewhat of a challenge. There are a few issues here:

  • We can’t do that from the LightThreadLocal.Dispose / finalizer. That would mean that we have to guard against concurrent data access, and that would impact the common path.
  • We don’t want to create a reference from the LightThreadLocal to the CurrentThreadState, that would lead to more complex data structure and may lead to slow GC.

Instead of holding references to the real objects, we introduce two new ones. A local state and a global state:

The global state exists at the level of the LightThreadLocal instance while the local state exists at the level of each thread. The local state is just a number, indicating whatever there are any disposed parents. The global state holds the local state of all the threads that interacted with the given LightThreadLocal instance. By introducing these classes, we break apart the object references. The LightThreadLocal isn’t holding (directly or indirectly) any reference to the CurrentThreadState and the CurrentThreadState only holds a weak reference for the LightThreadLocal.

Finally, we need to actually make use of this state and we do that by calling GlobalState.Dispose() when the LightThreadLocal is disposed. That would mark all the threads that interacted with it as having a disposed parents. Crucially, we don’t need to do anything else there. All the rest happens in the CurrentThreadState, in its own native thread. Here is what this looks like:

Whenever the Register method is called (which happens whenever we use the LightThreadLocal.Value property), we’ll register our own thread with the global state of the LightThreadLocal instance and then check whatever we have been notified of a disposal. If so, we’ll clean our own state in RemoveDisposedParents.

This close down all the loopholes in the usage that I can think about, at least for now.

This is currently going through our testing infrastructure, but I thought it is an interesting feature. Small enough to talk about, but complex enough that there are multiple competing requirements that you have to consider and non trivial aspects to work with.

time to read 4 min | 736 words

Image result for hacker clipartThe 4th fallacy of distributed computing is that the network is secured. It is a fallacy because sooner or later, you’ll realize that the network isn’t secured.

Case in point, Microsoft managed to put 250 million support tickets on the public internet. The underlying issue is actually pretty simple. Microsoft had five Elastic Search instances with no security or authentication.

From the emails that were sent, it seems that they were intend to be secured by separating them from the external networks using firewall rules. A configuration error meant that the firewall rule was no long applicable and they were exposed to the public internet. In this case, at least, I can give better marks than “did you really put a publicly addressable database on the internet in the days of Shodan?”

It isn’t a matter of if you’ll be attacked, it is an issue of when. And according to recent reports, the time it takes from being network accessible to being attacked is under a minute. At worst, it took less than a couple of hours for attacks to start.  If it is accessible, it will be attacked.

So it is was good from Microsoft to make sure that it wasn’t accessible, right? Except that it then became accessible. How much are you willing to bet that there was no monitoring on “these machine is not accessible from the internet”? For that matter, I’m not sure how you can write a monitoring system that check for this. The security assumptions changed, and the systems wasn’t robust to handle that. What is worse, it didn’t fail close. It failed wide open.

The underlying cause of this mess is that the assumption that you can trust the network. It is closed, secured and safe. So there was no additional line of defense.

When we designed RavenDB’s security, we started from the assumption that any RavenDB node is publicly accessible and will be attacked. As such, we don’t allow you to run RavenDB on anything but the loopback device without setting up security. Even when you are running inside locked network, you’ll still have mutual authentication between client and server, you’ll still have all communications between client and server encrypted.

Defense in depth is the only thing that make sense. Yes, it is belt and suspenders, but it means that if you have a failure, your privates aren’t hanging in the wind, waiting to be sold on the Dark Web.

When designing a system that listen to the network, you have to start from assuming you’ll be attacked. And any additional steps to reduce the attack surface are just that. They’ll reduce it, not eliminate it. Because a firewall may fail or be misconfigured, and it may not happen to you. But if a completely separate machine on your closed network has been compromised, you best hope that it won’t be able to be a bridgehead for the rest of your system.

This attack expose 250,000,000 support records(!) and it was observed because it was obvious. This is the equivalent of a big pile of money landing at your feet. It gets noticed. But let’s assume that the elastic node was an empty one, so it wouldn’t be interesting. It takes very little from having access to an unsecured server to being able to execute code on it. And then you have a bridgehead. You can then access other servers, which may be accessible from the opened server, but not for the whole wide world. If they aren’t secured, well, it doesn’t matter what your firewall rules say anymore…

The network is always hostile. You can’t assume who is on the other side, or that you aren’t being eavesdropped on. Luckily, we have fairly routine counter measures. Use TLS for everything and make sure that you authenticate. How you do it doesn’t matter that much, to be honest. User / pass over HTTPS or X509 certificate are just different options. And while I can debate which ones are the best, anything is going to better than nothing at all. This applies for in house software as well. You microservices should authenticate, even if they are running in the isolated backend.

Yes, it can be a PITA to configure and deploy, but it isn’t really something that you can give up on. Because the network is always hostile.


No future posts left, oh my!


  1. Production postmortem (29):
    23 Mar 2020 - high CPU when there is little work to be done
  2. RavenDB 5.0 (3):
    20 Mar 2020 - Optimizing date range queries
  3. Webinar (2):
    15 Jan 2020 - RavenDB’s unique features
  4. Challenges (2):
    03 Jan 2020 - Spot the bug in the stream–answer
  5. Challenge (55):
    02 Jan 2020 - Spot the bug in the stream
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats