Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

Posts: 7,045 | Comments: 49,766

filter by tags archive
time to read 3 min | 541 words

One of the distinguishing feature of RavenDB is its ability to process large aggregations very quickly. You can ask questions on very large data sets and get the results in milliseconds. This is interesting, because RavenDB isn’t an OLAP database and the kind of questions that we ask can be quite complex.

For example, we have the Products/Recommendations index, which allow us to ask:

For any particular product, find me how many times it was sold, what other products were sold with it and in what frequency.

The index to manage this is here:

The way it works, we map the orders and have a projection for each product, and then we add the other products that were sold with the current one. In the reduce, we group by the product and aggregate the related products together.

But I’m not here to talk about the recommendation engine. I wanted to explain how RavenDB process such indexes. All the information that I’m talking about can be seen in the Map/Reduce visualizer in the RavenDB Studio.

Here is a single entry for this index. You can see that products/11-A was sold 544 times and 108 times with products/69-A.

image

Because of the way RavenDB process Map/Reduce indexes, when we query, we run over the already precomputed results and there is very little computation cost at querying time.

Let see how RavenDB builds the index. Here is a single order, where three products were sold. You can see that each of them as a very interesting tree structure.

image

Here is how it looks like when we zoom into a particular product. You can see how RavenDB aggregate the data. First in the bottom most page on the right (#596). We aggregate that with the other 367 pages and get intermediate results at page #1410. We then aggregate that again with the intermediate results in page #105127 to get the final tall. In this case, you can see that products/11-A was sold 217,638 times and mostly with products/16-A (30,603 times) and products/72-A (20,603 times).

image

When we have a new order, all we need to do is update a bottom most page and then recurse upward in the three. In the case we have here, there is a pretty big reduce value and we are dealing with tens of millions of orders. We have three levels to the tree, which means that we’ll need to do three update operations to account for new or updated data. That is cheap, because it means that we have to do very little work to maintain the index.

At query time, of course, we don’t really have to do much, all the hard work was done.

I like this example, because it shows case a non trivial example and how RavenDB handles this with ease. These kind of non trivial work is something that tend to be very hard to get working properly and with RavenDB this is part of my default: “let’s do this on the fly demo”.

time to read 3 min | 452 words

It should come as no surprise that our entire internal infrastructure is running on RavenDB. I wholly believe in the concept of dog fooding and it has serve us very well over the years.

I was speaking to a colleague just now and it occurred to me that it is surprising that we do certain things wrong, intentionally. It is fair to say that we know what the best practices for using RavenDB are, the things that you can do to get the most out of it.

In some of our internal systems, we are doing things in exactly the wrong way. We are doing things that are inefficient in RavenDB. We take the expedient route to implement things.  A good example of that is that we have a set of documents that can grow to be multiple MB in size. They are also some of the most common changed documents in the system. Properly design would call to break them apart to make things easier for RavenDB.

We intentionally modeled things this way. Well, I gave the modeling task to an intern with no knowledge of RavenDB and then I made things worse for RavenDB in a few cases where he didn’t get it out of shape enough for my needs.

Huh?! I can hear you thinking. Why on earth would we do something like that?

We do this because if serves as an excellent proving ground for misuse of RavenDB. It show us how the system behave under non ideal situations. Not just when the user is able to match everything to the way RavenDB would like things to be, but how they are likely to build their system. Unaware of what is going on behind the scenes and what the optimal solution would be. We want RavenDB to be able to handle that scenario well.

An example that pops to mind was having all the uploads on the system be attachments on a single document. That surfaced that we had a O(N^2) algorithm very deep in the bowels of RavenDB for placing a new attachment. It would be completely invisible under normal case, because it was fast enough under any normal or abnormal situation that we could think of. But when we started getting high latency from uploads, we realized that adding the 100,002th attachment to a document required us to scan through the whole list… it was obvious that we needed a fix. (And please, don’t put hundreds of thousands of attachments on a document, it will work (and it is fast now), but it isn’t nice).

Doing the wrong thing on purpose means that we can be sure that when users are doing the wrong thing accidently, they get good behavior.

time to read 3 min | 577 words

email-me-clipart | free clip art from: www.fg-a.com/email1.s… | FlickrWe got a feature request that we don’t intend to implement, but I thought the reasoning is interesting enough for a blog post. The feature request:

If there is a critical error or major issue with the current state of the database, for instance when the data is not replicated from Node C to Node A due to some errors in the database or network it should send out mail to the administrator to investigate on  the issue. Another example is, if the database not active due to some errors then it should send out mail as well.

On its face, the request is very reasonable. If there is an error, we want to let the administrator know about it, not hide it in some log file. Indeed, RavenDB has the concept of alerts just for that reason, to surface any issues directly to the admin ahead of time. We also have a mechanism in place to allow for alerts for the admin without checking in with the RavenDB Studio manually: SNMP. The Simple Network Monitoring Protocol is designed specifically to enable this kind of monitoring and RavenDB expose a lot of state via that which you can act upon in your monitoring system.

Inside your monitoring system, you can define rules that will alert you. Send an SMS if the disk space is low, or email on an alert from RavenDB, etc. The idea of actively alerting the administrator is something that you absolutely want to have.

Having RavenDB send those emails, not so much. RavenDB expose monitoring endpoint and alerts, it doesn’t act or report on them. That is the role of your actual monitoring system. You can setup Zabbix or talk to your Ops team which likely already have one installed.

Let’s talk about the reason that RavenDB isn’t a monitoring system.

Sending email is actually really hard. What sort of email provider do you use? What options are required to set it up a connection? Do you need X509 certificate or user/pass combo? What happens if we can’t send the email? That is leaving aside the fact that actually getting the email delivered is hard enough. Spam, SPF, DKIM and DMARC is where things start. In short, that is a lot of complications that we’ll have to deal with.

For that matter, what about SMS integration? Surely that would also help. But no one uses SMS today, we want WhatsApp integration, and Telegram, and … You go the point.

Then there are social issues. How will we decide if we need to send an email or not? There should be some policy, and ways to configure that. If we won’t have that, we’ll end up sending either too many emails (which will get flagged / ignored) or too few (why aren’t you telling me about XYZ issue?).

A monitoring system is built to handle those sort of issues, it is able to aggregate reports and give you a single email with the current status, open issues for you to fix and do a whole lot more that is simply outside the purview or RavenDB. There is also the most critical alert of all, if RavenDB is down, it will not be able report that it is down because it is down.

The proper way to handle this is to setup integration with a monitoring system, so we’ll not be implementing this feature request.

time to read 6 min | 1084 words

In my previous post, I wrote about the case of a medical provider that has a cloud database to store its data, as well as a whole bunch of doctors making house calls. There is the need to have the doctors have (some) information on their machine as well as push updates they make locally back to the cloud.

image

However, given that their machines are in the field, and that we may encounter a malicious doctor, we aren’t going to fully trust these systems. We still want the system to function, though. The question is how will we do it?

Let’s try to state the problem in more technical terms:

  • The doctor need to pull data from the cloud (list of patients to visits, patient records, pharmacies and drugs available, etc).
  • The doctor nee to be able to create patient records (exam made, checkup results, prescriptions, recommendations, etc).
  • The doctor’s records needs to be pushed to the cloud.
  • The doctor should not be able to see any record that is not explicitly made available to them.
  • The same applies for documents, attachments, time series, counters, revisions, etc.

Enforcing Distributed Data Integrity

The requirements are quite clear, but they do bring up a bit of a bother. How are we going to enforce it?

One way to do that would be to add some metadata rule for the document, deciding if a doctor should or should not see that document. Something like that:

image

In this model, a doctor will have be able to get this document if they have any of the tags associated with the document.

This can work, but that has a bunch of non trivial problems and a huge problem that may not be obvious. Let’s start from the non trivial issues:

  • How do you handle non document data? Based on the owner document, probably. But that means that we have to have a parent document. That isn’t always the case.
  • It isn’t always the case if the document was deleted, or is in a conflicted state.
  • What do you do with revisions, if the access tags has changed? What do you follow?

There are other issues, but as you can imagine, they are all around managing the fact that this model allows you to change the tags for the document and expect to handle this properly.

The huge problem, however, is what should happen when a tag is removed? Let’s assume that we have the following sequence of events:

  • patients/oren is created, with access tag of “doctors/abc”
  • That access tag is then removed
  • Doctor ABC’s machine is then connected to the cloud and setup replication.
  • We need to remove patients/oren from the machine, so we send a tombstone.

So far, so good. However, what about Doctor' XYZ’s machine? At this time, we don’t know what the old tags were, and that machine may or may not have that document. It shouldn’t have it now, so we’ll send a tombstone there? That leads to information leak by revealing document ids that we aren’t authorized for.

We need a better option.

Using the Document ID as the Basis for Data Replication

We can define that once created, the access tags are immutable, and that would help considerably.  But that is still fairly complex to manage and opens up issues regarding conflicts, deletion and re-creation of a document, etc.

Instead, we are going to use the document’s id as the source for the decision to replicate the document or not. In other words, when we register the doctor’s machine, we set it up so it will allow:

Incoming paths Outgoing paths
  • doctors/abc/visits/*
  • tasks/doctors/abc/*
  • patients/clinics/33-conventry-rd/*
  • pharmacies/*
  • tasks/doctors/abc/*
  • doctors/abc
  • laboratories/*

In this case, incoming and outgoing are defined from the point of view of the cloud cluster. So this setup allows the doctor’s machine to push updates to any document with an id that starts with “doctors/abc/visits/” or “tasks/doctors/abc/*”. And the cloud will send all pharmacies and laboratories data. The cloud will also send all the patients for the doctor’s clinic as well as the tasks for this doctor, finally, we have the doctor’s record itself. Everything else will be filtered.

This Model is Simple

This model is simple, it provides a list of outgoing and incoming paths for the data that will be replicated. It is also quite surprisingly powerful. Consider the implications of the configuration above.

The doctor’s machine will have a list of laboratories and pharmacies (public information) locally. It will have the doctor’s own document as well as records of the patients in the clinic. The doctor is able to create and push patient visit’s records. Most interestingly, the tasks for the doctor are defined to allow both push and pull. The doctor will receive updates from the office about new tasks (home visits) to make and can mark them complete and have it show up in the cloud.

The doctor’s machine (and the doctor as well) is not trusted. So we limit the exposure of the data that they can see on a Need To Know basis. On the other hand, they are limited in what they can push back to the cloud. Even with these limitations, there is a lot of freedom in the system, because once you have this defined, you can write your application on the cloud side and on the laptop and just let RavenDB handle the synchronization between them. The doctor doesn’t need access to a network to be able to work, since they have a RavenDB instance running locally and the cloud instance will sync up once there is any connectivity.

We are left with one issue, though. Note that the doctor can get the patients’ files, but is unable to push updates to them. How is that going to work?

The reason that the doctor is unable to write to the patients’ files is that they are not trusted. Instead, they will send a visit record, which contains their finding and recommendation and on the cloud, we’ll validate the data, merge it with the actual patient’s record, apply any business rules and then update the record. Once that is done, it will show up in the doctor’s machine magically. In other words, this setup is meant for untrusted input.

There are more details that we can get into, but I hope that this outline the concepts clearly. This is not a RavenDB 5.0 feature, but will be part of the next RavenDB release, due around September.

time to read 4 min | 753 words

RavenDB is typically deployed as a set of trusted servers. The network is considered to be hostile, which is why encrypt everything over the wire and using X509 certificates for mutual authentication, but once the connection is established, we trust the other side to follow the same rules as we do.

To clarify, I’m talking here about trust between nodes, not a client connected to RavenDB. These are also authenticated using X509 certificate, but they are limited to the access permissions assigned to them. Nodes in a cluster fully trust one another and need to do things like forward commands accepted by one node to another one. That requires that the second node trust that the first node properly authenticated the client and won’t pass operations that the client has no authority for.

Use Case 1: A Database System for Multiple Medical Clinics

I think that a real use case might make things more concrete. Let’s assume that we have a set of clinics, with the following distribution of data.

image 

We have two clinics, one in Boston and one in Chicago, as well as a cloud system. The rules of the system are as follows:

  • Data from each clinic is replicated to the cloud.
  • Data from the cloud is replicated to the clinics.
  • Data from a clinic may only be at the clinic or in the cloud.
  • A clinic cannot get (or modify) data that didn’t came from the clinic.

In this model, we have three distinct locations, and we presumably trust all of them (otherwise, would we put patient data on them?). There is a need to ensure that we don’t expose patient data from one clinic to another, but that is about it. Note that in terms of RavenDB topology, we don’t have a single cluster here. That wouldn’t make sense. To start with, we need to be able to operate the clinic when there is no internet connectivity. And we don’t want to pay with any avoidable latency even if everything is working fine.  So in this case, we have three separate clusters, one in each location, and they are connected to one another using RavenDB’s multi master replication.

Use Case 2: A Database System Sharing with Outside Edge Points

Let’s look at another model, however, in this case, we are still dealing with medical data, but instead of a clinic, we have to deal with a doctor making house calls:

image

In this case, we are still talking about private data, but we are no longer trusting the end device. The doctor may lose the laptop, they may have malware running on the machine or may be trying to do Bad Things directly.  We want to be able to push data to the doctor’s machine and receive updates from the field.

RavenDB has some measures at the moment to handle this scenario. You need to only get some data from the cloud to the doctor’s laptop, and you want to push only certain things back to the cloud. You can use pull replication and ETL. to handle this scenario, and it will work, as long as you are willing to trust the end machine. Given the stringent requirement for medical data, it is not something out of bounds, actually. Full volume encryption, forbidding use of unknown software and a few other protections ensure that if the laptop is lost, the only thing you can do with it is repurpose the hardware.  If we can go with that assumption, this is great.

However… we need to consider the case that our doctor is actually malicious.

image

When the Edge Point isn’t as Healthy as the Doctor Using It

So we need a something in the middle, between all our data and what can reside on that doctor’s machine.  As it currently stands, in order to create the appropriate barrier between the doctor’s machine and the cloud, you’ll have to write your own sync code and apply any logic / authorization at that level.

Sync code is non trivial, mostly because of the number of edge cases you have to deal with and the potential for conflicts. This has already been solved by RavenDB, so having to write it again is not ideal as far as we are concerned.

What would you do?

time to read 2 min | 380 words

A sadly common place “attack” on applications is called “Web Parameter Tampering”. This is the case where you have a URL such as this:

https://secret.medical.info/?userid=823

And your users “hack” you using:

https://secret.medical.info/?userid=999

And get access to another users records.

As an aside, that might actually be considered to be hacking, legally speaking. Which make me want to smash my head on the keyboard a few time.

Obviously, you need to run your security validation on parameters, but there are other reasons to want to avoid to expose the raw identifiers to the user. If you are using the a incrementing counter of some kind, creating two values might cause you to leak the rate in which your data change. For example, a competitor might want to create an order once a week and track the number of the order. That will give you a good indications of how many orders there have been in that time frame.

Finally, there are other data leakage issues that you want to might want to take into account. For example, “users/321” means that you are likely to be using RavenDB while “users/4383-B” means that you are using RavenDB 4.0 or higher and “607d1f85bcf86cd799439011” means that you are using MongoDB.

A common reaction to this is to switch your ids to use guids. I hate that option, it means that you are entering very unfriendly territory  for the application. Guids convey no information to the developers working with the system and they are hard to work with, from a humane point of view. They are also less nice for the database systemto work with.

A better alternative is to simply mask the information when it leaves your system. Here is the code to do so:

You can see that I’m actually using AES encryption to hide the data, and then encoding it in the Bitcoin format.

That means that an identifier such as "users/1123" will result in output such as this:

bPSPEZii22y5JwUibkQgUuXR3VHBDCbUhC343HBTnd1XMDFZMuok

The length of the identifier is larger, but not overly so and the id is even URL safe Smile. In addition to hiding the identifier itself, we also ensure that the users cannot muck about in the value. Any change to the value will result in an error to unmask it.

time to read 1 min | 154 words

The most famous example about the use of transactions is the money transfer scenario. As money is shifted from one account to the other, we want to ensure that no money goes poof or made up of whole cloth.

I just logged in to the bank to pay the taxes. It is a boring process that mostly consist of checking a box in a transfer filled by the accounting department. Today there was much excitement. The transfer failed.

That was strange.

I got an error that the money transfers failed and that I should process the order again later.  I checked my balance and the money is deducted from my account.

I’m trying to decide if I should shrug it off and just make sure that the money was sent in a couple of days or if I should call someone at the bank and offer them consulting services about how to build transactional systems.

time to read 4 min | 762 words

The underlying assumption for any distributed system is that the network is hostile. That assumption is pervasive. If you open a socket, you have to be aware of malicious people on the other side (and in the middle). If you accept input from external sources, you have to be careful, because it may be carefully crafted to do Bad Things. In short, the network is hostile, and you need to protect yourself from harm at all levels.

In many cases, you already have multiple layers of protection. For example, when building web applications, the HTTP Server already validate that the incoming streams follows the HTTP protocol. Even ignoring maliciousness, you will get services that connect to you using the wrong protocols and creating havoc. For reference, look at 1347703880, 1213486160 and 1195725856 and the issues they cause. As it turns out, these are relatively benign issues, because they are caught almost immediately. In the real world, the network isn’t only hostile, it is also smart.

The problem was originally posed by Lamport in the Byzantine Generals paper. You have a group of generals that needs to agree on a particular time to attack a city. They can only communicate by (unreliable) messenger, and one or more of them are traitors. The paper itself is interesting to read and the problem is pervasive enough that we now divide distributed systems to Byzantine and non-Byzantine systems.  We now have pervasive cryptography deployed, to the point where you read this post over an encrypted channel, verified using public key infrastructure to validate that it indeed came from me. You can solve the Byzantine generals problem easily now.

Today, the terminology changed. We now refer to Byzantine networks as systems where some of the nodes are malicious and non-Byzantine as systems where we trust that other nodes will do their task. For example, Raft or Paxos are both distributed consensus algorithms that assumes a non-Byzantine system. Oh, the network communication gores through hostile environment, but that is why we have TLS for. Authentication and encryption over the wire are mostly a solved problem at this point. It isn’t a simple problem, but it is a solved one.

So where would you run into Byzantine systems today? The obvious examples are cryptocurrencies and Bit Torrent. In both cases, you have distributed environment with incentives for the other side to cheat. In the case of cryptocurrencies, this is handled by proof of work / proof of stake as well as the cost of getting to 51% majority on the network. In the case of Bit Torrent, it is an attempt to get peers to both download and upload. These examples are the first one that pops to mind, but in reality, the most common tool to use with a Byzantine network is the browser.

The browser has to assume that every site is malicious and that the web servers has to assume that each client is malicious.  For that matter, every server has to assume that every client is malicious as well. You only have to read through OWASP listing to understand that.

And how does this related to databases? Distributed databases are composed of independent nodes, which cooperate together to store and process your data. Most of the generally available database systems are assuming non-Byzantine model. In other words, they authenticate the other nodes, but once past the authentication, the other node is trusted to operate as expected.

For the most part, that is a reasonable assumption to make. You run your database on machine that you tend to trust, after all. And assuming non-Byzantine systems allow for a drastically simpler system design and much higher performance.

However, we are starting to see more and more system deployed on the edge. And that raise an interesting question, who controls the edge? Let’s assume that we have a traffic monitoring system, based on software that is running on your phone. While it may be all part of a single system, you have to take into account that you are now running on a system that is controlled by someone else, who may modify / change it at will.

That leads to interesting issues with regards to the design of such a system. On the one hand, you want to get data from the nodes in the fields, but on the other hand, you need to be careful about trusting those nodes.

How would you approach such a system? Keep in mind that you want to reduce, as much as possible, the complexity of the system while not breaching its security.

time to read 3 min | 596 words

An interesting question was raised in the mailing list, how do we handle non trivial transactions within a RavenDB cluster? The scenario in the question is money transfer between accounts, which is almost the default example for transactions.

The catch here is that now we have a set of business rules to apply.

  • Neither account may be blocked
  • There should be sufficient funds in the source account
  • The funds involved are not tainted

Let’s see how this can look like in code, shall we?

I’m assuming am model similar to bitcoin, where you can break apart coins, but not merge them, by the way. What we can see is that we have non trivial logic going on here, especially as we are hiding all the actual work involved here behind the scenes. The code, as written, will work great, as long as we have a single thread of execution. We can change it easily to support concurrent work using:

image

With this call: session.Advanced.UseOptimisticConcurrency = true; we tell RavenDB that it should do an optimistic concurrency check on the documents when we save them. If any of the documents has changed while the code is running a concurrency exception will be raised and the whole transaction will be aborted.

This works, until you are running on a distributed cluster and have to support transactions running on different nodes. Luckily, RavenDB has an answer for that as well, with cluster wide transactions.

It is important to understand that with cluster wide transactions, we have two separate layers here. The documents are part of the overall transaction, but do not impact it. We need to use compare exchange values to allow optimistic concurrency on the whole operation. Here is the code,  it is a bit dense, but the key here is that we added a compare exchange value that mirrors every account. Whenever we modify the account, we also modify the compare exchange value. Then we can use a cluster wide transaction like so:

With this in place, RavenDB will ensure that this transaction will only be accepted if both compare exchange values are the same as they were when we evaluated them. Because any change touch both the document and the compare exchange values, and because they are modified in a cluster wide transaction, we are safe even in a clustered environment. If there is a change to the data, we’ll get a concurrency exception and have to re-compute.

Important: You’ll note that we are comparing the documents and the compare exchange values to make sure that they have the same funds. Why do we need to do that? Compare exchange and documents live on separate stores and are replicated across the cluster using different mechanisms. This is because compare exchange values are using consensus and documents are using gossip for replication. It is possible that they won’t be in sync when we read them, which is why we do the extra check to ensure that we start from a level playing field.

Note that I’m using API 5.0 here, with 4.2, you need to explicitly call UpdateCompareExchangeValue, but in 5.0, we have fixed things so we do change tracking on the values. That means that it is important that the values would actually change, of course.

And this is how you write a distributed transaction with RavenDB to transfer funds in a clustered environment without needing to really worry about any of the details. Handle & retry a concurrency exception and you are pretty much done.

time to read 4 min | 796 words

imageThere have been a couple of cases where I was working on a feature, usually a big and complex one, that made me realize that I’m just not smart enough to actually build it.

A long time ago (five or six years), I had to tackle free space handling inside of Voron. When an operation cause a page to be released, we need to register that page somewhere, so we’ll be able to reuse it later on. This doesn’t seem like too complex a feature, right? It is just a free list, what is the issue?

The problem was that the free list itself was managed as pages, and changes to the free list might cause pages to be allocated or de-allocated. This can happen while the free list operation is running. Basically, I had to make the free list code re-entrant. That was something that I tried to do for a long while, before just giving up. I wasn’t smart enough to handle that scenario properly. I could plug the leaks, but it was too complex and obvious that I’m going to run into additional cases where this would cause issues.

I had to use another strategy to handle this. Instead of allowing the free list to do dynamic allocations, I allocated the free list once, and then used a bitmap mode to store the data itself. That means that modifications to the free list couldn’t cause us to mutate the structure of the free list itself. The crisis was averted and the feature was saved.

I just checked, and the last meaningful change that happened for this piece of code was in Oct 2016, so it has been really stable for us.

The other feature where I realized I’m not smart enough to handle came up recently. I was working on the documents compression feature. One of the more important aspects of this feature is the ability to retrain, which means that when compressing a document, we’ll use a dictionary to reduce the size, and if the dictionary isn’t well suited, we’ll create a new one. The first implementation used a dictionary per file range. So all the documents that are placed on a particular location will use the same dictionary. That has high correlation to the documents written at the same time and had the benefit of ensuring that new updates to the same document will use its original dictionary. That is likely to result in good compression rate over time.

The problem was that during this process, we may find out that the dictionary isn’t suited for our needs and that we need to retrain. At this point, however, we were already computed the required space. But… the new dictionary may compress the data different (both higher and lower than what was expected). The fact that I could no longer rely on the size of the data during the save process lead to a lot of heartache. The issue was worse because we first try to compress the value using a specific dictionary, find that we can’t place it in the right location and need to put it in another place.

However, to find the new place, we need to be know what is the size that we need to allocate. And the new location may have a different dictionary, and there the compressed value is different…

Data may move inside RavenDB for a host of reasons, compaction, defrag, etc. Whenever we would move the data, we would need to recompress it, which led to a lot of complexity. I tried fighting that for a while, but it was obvious that I can’t manage the complexity.

I changed things around. Instead of having a dictionary per file range, I tagged the compressed value with a dictionary id. That way, each document could store the dictionary that it was using. That simplified the problem space greatly, because I only need to compress the data once, and afterward the size of the data remains the same. It meant that I had to keep a registry of the dictionaries, instead of placing a dictionary at the end of the specified file range, and it somewhat complicates recovery, but the overall system is ever so much simpler and it took a lot less effort and complexity to stabilize.

I’m not sure why just these problems shown themselves to be beyond my capabilities. But it does bring to mind a very important quote:

When I Wrote It, Only God and I Knew the Meaning; Now God Alone Knows

I also have to admit that I have had no idea that this quote predates code and computing. The earlier known reference for it is from 1826.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Webinar recording (8):
    10 Jul 2020 - Multi tenancy with RavenDB
  2. RavenDB Webinar (3):
    01 Jun 2020 - Polymorphism at Scale
  3. Podcast (2):
    28 May 2020 - Adventures in .NET High performance databases with RavenDB with Oren Eini
  4. Talk (5):
    23 Apr 2020 - Advanced indexing with RavenDB
  5. Challenge (57):
    21 Apr 2020 - Generate matching shard id–answer
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats