Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:

oren@ravendb.net

+972 52-548-6969

Posts: 7,326 | Comments: 50,669

Privacy Policy Terms
filter by tags archive
time to read 2 min | 261 words

I’m teaching a college class about Cloud Computing and part of that is giving various assignments to build stuff on the cloud. That part is pretty routine.

One of my requests for the tasks is to identify failure mode in the system, and one of the students that I’m currently grading had this doozy scenario:

If you’ll run this code you may have to deal with this problem. Just nuke the system and try again, it only fails because of this once in a while.

The underlying issue is that he is setting up a Redis instance that is publicly accessible to the Internet with no password. On a regular basis, automated hacking tools will scan, find and ransom the relevant system. To the point where the student included a note on that in the exercise.

A great reminder that the Network is Hostile. And yes, I’m aware of Redis security model, but I don’t agree with it.

I’m honestly not sure how I should grade such an assignment. On the one hand, I don’t think that a “properly” secured system is reasonable to ask from a student. On the other hand, they actually got hacked during their development process.

I tried setting up a Redis honeypot to see how long it would take to get hacked, but no one bit during the ~10 minutes or so that I waited.

I do wonder if the fact that such attacks are so prevalent, immediate and destructive means that through the process of evolution, you’ll end up with a secured system (since unsecured isn’t going to be working).

time to read 3 min | 543 words

We run into a strange situation deep in the guts of RavenDB. A cluster command (the backbone of how RavenDB is coordinating action in a distributed cluster) failed because of an allocation failure. That is something that we are ready for, since RavenDB is a robust system that handles such memory allocation failures. The problem was that this was a persistent allocation failure. Looking at the actual error explained what was going on. We allocate memory in units that are powers of two, and we had an allocation request that would overflow a 32 bits integer.

Let me reiterate that, we have a single cluster command that would need more memory than can fit in 32 bits. A cluster command isn’t strictly limited, but a 1MB cluster command is huge, as far as we are concerned. Seeing something that exceeds the GB mark was horrifying. The actual issue here was somewhere completely different, there was a bug that caused quadratic growth in the size of a database record. This post isn’t about that problem, it is about the fix.

We believe in defense in depth for such issues. So aside from fixing the actual cause for this problem, the issue was how we can prevent similar issues in the future. We decided that we’ll place a reasonable size limit on the cluster commands, and we chose 128MB as the limit (this is far higher than any expected value, mind). We chose that value since it is both big enough to be outside anyone's actual usage, but at the same time, it is small enough that we can increase this if we need to. That means that this needs to be a configuration value, so the user can modify that in place if needed. The idea is that we’ll stop the generation of a command of this size, before it hits the actual cluster and poison it.

Which brings me to this piece of code, which was the reason for this blog post:

This is where we are actually throwing the error if we found a command that is too big (the check is done by the caller, not important here).

Looking at the code, it does what is needed, but it is missing a couple of really important features:

  • We mention the size of the command, but not the actual size limit.
  • We don’t mention that this isn’t a hard coded limit.

The fix here would be to include both those details in the message. The idea is that the user will not only be informed about what the problem is, but also be made aware of how they can fix it themselves. No need to contact support (and if support is called, we can tell right away what is going on).

This idea, the notion that we should be quite explicit about not only what the problem is but also how to fix it, is very important to the overall design of RavenDB. It allows us to produce software that is self supporting, instead of ErrorCode: 413, you get not only the full details, but how you can fix it.

Admittedly, I fully expect to never ever hear about this issue again in my lifetime. But in case I’m wrong, we’ll be in a much better position to respond to it.

time to read 5 min | 810 words

image

I got an interesting question from a customer recently and thought that it would make for a fun blog post. The issue the customer is facing is that they are aggregating data from many sources, and they need to make sense of all the data in a nice manner. For some of the sources, they have some idea about the data, but in many cases, the format of the data they get is pretty arbitrary.

Consider the image on the right, we have four different documents, from separate sources:

  • titles/123-45-678/2022-01-28 – The car title
  • tickets/0000000000000000009-A – A parking ticket that was issued for a car
  • orders/0000000000000000010-A – An order from a garage about fixes made for a customer (which includes some cars)
  • claims/0000000000000000011-A – Details of a rejected insurance claim for a car

We need to make sense of all of this information and provide some information to the user about a car from all those sources. The system in question is primarily interested in cars, so what I would like to do is show a “car file”. All the information at hand that we have for a particular car. The problem is that this is not trivial to do. In some cases, we have a field with the car’s license plate, but each data source named it differently. In the case of the Order document, the details about the specific service for the car are deep inside the document, in a free form text field.

I can, of course, just index the whole thing and try to do a full text search on the data. It would work, but can we do better than that?

A license plate in the system has the following format: 123-45-768. Can we take advantage of that?

If you said regex, you now have two problems :-).

Let’s see what we can do about this…

One way to handle this is to create a multi map-reduce index inside of RavenDB, mapping the relevant items from each collection and then aggregating the values by the car’s license plate from all sources. The problem with this approach is that you’ll need to specialize for each and every data source you have. Sometimes, you know what the data is going to look like and can get valuable insight from that, but in other cases, we are dealing with whatever the data providers will give us…

For that reason, I created the following index, which uses a couple of neat techniques all at once to give me insight into the data that I have in the system, without taking too much time or complexity.

This looks like a lot of code, I know, but the most complex part is in the scanLicensePlates() portion. There we define a regex for the license plate and scan the documents recursively trying to find a proper match.

The idea is we’ll find a license plate in either the field directly (such as Title.LicensePlate) or part of the field contents (such as Orders.Lines.Task field). Regardless of where we find the data, in the map phase we’ll emit a separate value for each detected license plate in the document. We’ll then aggregate by the license plate in the reduce phase. Some part of the complexity here is because we are building a smart summary, here is the output of this index:

As you can see, the map-reduce index results will give us the following data items:

  • The license plate obviously (which is how we’ll typically search this index)
  • The summary for all the data items that we have for this particular license plate. That will likely be something that we’ll want to show to the user.
  • The ids of all the documents related to this license plate, which we’ll typically want to show to the user.

The nice thing about this approach is that we are able to extract actionable information from the system with very little overhead. If we have new types of data sources that we get in the future, they’ll seamlessly merge into the final output for this index.

Of course, if you know more about the data you are working with, you can probably extract more interesting information. For example, we may want to show the current owner of the car, which we can extract from the latest title document we have. Or we may want to compute how many previous owners a particular vehicle has, etc.

As the first step to aggregate information from dynamic data sources, that gives us a lot of power. You can apply this technique in a wide variety of scenarios. If you are finding yourself doing coarse grained searches and trying to regex your way to the right data, this sort of approach can drastically improve your performance and make it far easier to build a system that can be maintained over the long run.

time to read 5 min | 830 words

imageI recently had a conversation about ACID, I don’t think it would surprise anyone that I’m a big proponent of ACID. After all, RavenDB was an ACID database from the first release.

When working with distributed systems, on the other hand, it is far harder to get ACID guarantees at a reasonable cost. Pretty much all the 1st generation NoSQL databases left ACID on the sidelines, because it is a hard problem. That was one of the primary reasons why RavenDB even exists. I couldn’t imagine living without transactions. This is a post from 2011, talking about just that topic.

Consistency in a distributed system is a hard problem, mostly because it has an impact on the design and performance of the system. It is also common to think about ACID as a binary property, which is sort of true (A for Atomic Smile). However, it turns out that the real world is a lot more nuanced than that.

I want to discuss the consistency model for RavenDB as it applies to running in a distributed cluster. It is ACID with eventual consistency, which doesn’t sound like it makes sense, right?

I found a good example to explain the importance of ACID operations from your database even in the presence of eventual consistency.

Consider the following scenario, we have a married couple with a shared bank account. Both husband and wife have a checkbook for the account and primarily use checks to pay for things in their day to day life.

Checks are anachronistic for some people, who are used to instant payments and wire transfers. The great thing about checks is that they are (by definition) a way to work in a distributed system. You hand someone a check and at some future point in time they will deposit that and get the money from your account.

One of the most important aspects of using checks was managing that delay. The amount of money you had in the account didn’t necessarily represent how much money you had available. If your rent check wasn’t deposited yet, you still had to consider the rest money “gone”, even if you could still see it in the bank statement.

Because of checks’ eventual consistency, a really important part of using checks was to keep track of all the outstanding checks that weren’t deposited yet. You did that by filling in the stub of the check in the checkbook whenever you wrote a check. In other words, you never gave a check before you properly filled the stub for that.

That brings us back to ACID. The act of filling the stub and writing the check is a transaction, composed of two separate actions. That action isn’t a global transaction. The husband and wife in our example do not have to coordinate with one another whenever they write a check. But they do need to ensure that no check would be handed off without a proper stub (and vice versa, if we want to be exact). If the act of writing a check and filling the stub isn’t atomic, you may have a check unexpectedly hit your account, which is… exciting (in the Chinese proverb  manner).

On the other side, the entity that you handed the check to also needs a transaction. They need to fill out an invoice for the check (even though it hasn’t been deposited yet). Having a check with no invoice or an invoice with no check is… bad (as in, IRS agents having shots and high fives during an audit).

The idea is that at the local level, you have to use transactions, otherwise, you cannot be sure about the consistency of your own data. If you don’t have transactions at the persistence layer, you’ll have to build it on top of that, which is… not ideal, really hard and usually not going to work in all cases.

With local transactions, you can then start pushing consistent data out and resolve all the distributed states you have.

Going back to our husband and wife example, for the most part, they can act completely independently of one another, and they’ll reconcile their account status with each other at a later date (weekly budget meeting). At the same time, there are certain transactions (pun intended) where they won’t act independently. A great example is buying a car, that sort of amount requires that both will be consulted on the purchase. That is a high value operation, so it is worth the additional cost of distributed consistency.

With RavenDB, we have the notion of local node transactions, which are then sent out to the rest of the nodes in the cluster in the background (async replication) as well as support for cluster wide transactions, which requires the consent of a majority of the nodes in the cluster. You can choose for each scenario exactly what level of transactions and consistency you want to have, local or global.

time to read 1 min | 105 words

We are looking to hire another Developer Advocate for RavenDB. The position involves talking to customers and users, help them build and design RavenDB based solutions.

It requires excellent written and oral communications with good presentation skills in English, good familiarity with software architecture, DevOps and of course, RavenDB itself.

Responsibilities:

  • Developing and growing long-term relationships with technology leaders within prospect organizations
  • Understanding the customer’s requirements and assessing the product fit
  • Making technical presentations and demonstrating how a product meets clients needs
  • Involvement in the completion of technical questions on RFPs
  • Attending conferences, meetups and trade shows

If you are interested, or know someone who is, please ping us at: jobs@ravendb.net

time to read 6 min | 1134 words

When you have a distributed system, one of the key issues that you have to deal with is the notion of data ownership. The problem is that it can be a pretty hard issue to explain properly, given the required background material. I recently came up with an explanation for the issue that I think is both clear and shows the problem in a way that doesn’t require a lot of prerequisites.

First, let’s consider two types of distributed systems:

  • A distributed system that is strongly consistent – such a system requires coordination between at least a majority of the nodes in the system to do anything meaningful. Such systems are inherently limited in their ability to scale out, since the number of nodes that you need for a consensus will become unrealistic quite quickly.
  • A distributed system that is eventually consistent – such a system allows individual components to perform operations on their own, which will be replicated to the other nodes in due time. Such systems are easier to scale, but there is no true global state here.

A strongly consistent system with ten nodes requires each operation to reach at least 6 members before it can proceed. With 100 nodes, you’ll need 51 members to act, etc. There is a limit to how many nodes you can add to such a system before it is untenable. The advantage here, of course, is that you have a globally consistent state. An eventually consistent system has no such limit and can grow without bound. The downside is that it is possible for different parts of the system to make decisions that each make sense individually, but are not taken together. The classic example is the notion of a unique username, a new username that is added in two distinct portions of the system can be stored, but we’ll later find out that we have a duplicate.

A strongly consistent system will prevent that, of course, but has its limititations. A common way to handle that is to split the strongly consistent system in some manner. For example, we may have 100 servers, but we split it into 20 groups, of 5 servers each. Now each username belongs to one of those groups. We can now have our cake and eat it too, we have 100 servers in the system, but we can make strongly consistent operations with a majority of 3 nodes out of 5 for each username. That is great, unless you need to do a strongly consistent operation on two usernames that belong in different groups.

I mentioned that distributed system can be tough, right? And that you may need some background to understand how to solve that.

Instead of trying to treat all the data in the same manner, we can define data ownership rules. Let’s consider a real world example, we have a company that has three branches, in London, New York City and Brisbane. The company needs to issue invoices to customers and it has a requirement that the invoice numbers will be consecutive numbers. I used World Clock Planner to pull the intersection of availability of those offices, which you can see below:

image

Given the requirement for consecutive numbers, what do we know?

Each time that we need to generate a new invoice number, each office will need to coordinate with at least another office (2 out of 3 majority).  For London, that is easy, there are swaths of times where both London and New York business hours are overlapping.

For Brisbane, not so much. Maybe if someone is staying late in the New York office, but Brisbane will not be able to issue any invoices on Friday past 11 AM. I think you’ll agree that being able to issue an invoice on Friday’s noon is not an unreasonable requirement.

The problem here is that we are dealing with a requirement that we cannot fulfill. We cannot issue globally consecutive numbers for invoices with this sort of setup.

I’m using business hours for availability here, but the exact same problem occurs if we are using servers located around the world. If we have to have a consensus, then the cost of getting it will escalate significantly as the system becomes more distributed.

What can we do, then? We can change the requirement. There are two ways to do so. The first is to assign a range of numbers to each office, which they are able to allocate without needing to coordinate with anyone else. The second is to declare that the invoice numbers are local to their office and use the following scheme:

  • LDN-2921
  • NYC-1023
  • BNE-3483

This is making the notion of data ownership explicit. Each office owns its set of invoice numbers and can generate them completely independently. Branch offices may get an invoice from another office, but it is clear that it is not something that they can generate.

In a distributed system, defining the data ownership rules can drastically simplify the overall architecture and complexity that you have to deal with.

As a simple example, assume that I need a particular shirt from a store. The branch that I’m currently at doesn’t have the particular shirt I need. They are able to lookup inventory in other stores and direct me to them. However, they aren’t able to reserve that shirt for me.

The ownership on the shirt is in another branch, changing the data in the local database (even if it is quickly reflected in the other store) isn’t sufficient. Consider the following sequence of events:

  1. Branch A is “reserving” the shirt for me on Branch B’s inventory
  2. At Branch B, the shirt is being sold at the same time

What do you think will be the outcome of that? And how much time and headache do you think you’ll need to resolve this sort of distributed race condition.

On the other hand, a phone call to the other store and a request to hold the shirt until I arrive is a perfect solution to the issue, isn’t it? If we are talking in terms of data ownership, we aren’t trying to modify the other store’s inventory directly. Instead we are calling them and asking them to hold that. The data ownership is respected (and if I can’t get a hold of them, it is clear that there was no reservation).

Note that in the real world it is often easier to just ignore such race conditions since they are rare and “sorry” is usually good enough, but if we are talking about building a distributed system architecture, race conditions are something that happens yesterday, today and tomorrow, but not necessarily in that order.

Dealing with them properly can be a huge hassle, or negligible cost, depending on how you setup your system. I find that proper data ownership rules can be a huge help here.

time to read 1 min | 102 words

The internet is a hostile place. Publicly accessible machines will be attacked within minutes of being connected, and any unencrypted data in transit is likely to be intercepted and modified. Every day there are successful attacks on applications and databases that are insufficiently protected, resulting in data leaks and ransom demands. In this webinar, RavenDB CEO Oren Eini will go over the threat model for RavenDB, the security aspects involved in deploying in production, and the assumptions involved in working with trusted parties and byzantine partners.


time to read 2 min | 367 words

Yesterday I gave a webinar about Database Security in a Hostile World and I got a really interesting question at the end:

If your RavenDB and app are on the same server, would you recommend using certificates?

The premise of the talk was that you should always run in a secured mode, even if you are running on a supposedly secured network. That is still very much relevant if you are running on a single server.

Consider the following deployment model:

image
(Icons from https://www.flaticon.com/)

As you can see, both the application and the server are running on the same machine. We can usually assume that there is no possibility for an attacker not running on the same machine to eavesdrop on the communication. Can we skip encryption and authentication in this case?

The answer is no, even for this scenario. That may be a viable model if you are running on your development machine, with nothing really valuable in terms of data, but it isn’t a good model everywhere else.

Why is that? The answer is quite simple, there are two issues that you have to deal with:

  • At some future time, the firewall rules will be relaxed (by an admin debugging something, not realizing what they are doing) and the “local” server may be publicly exposed.
  • Even if you are listening only to port 127.0.0.1, without authentication, you are exposed to anything that is local that can be tricked to contact you. That is a real attack.

In short, for proper security, assume that even if you are running on the local network, with no outside access, you are still in a hostile environment. The key reason for that is that things change. In two years, as the system grows, you’ll want to split the database & application to separate servers. How certain are you that the person doing this split (assume that you are no longer involved) will do the Right Thing versus the minimum configuration changes needed to make it “work”?

Given that the whole design of RavenDB’s security was to make it easy to do the right thing, we should apply it globally.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Challenge (66):
    06 May 2022 - Spot the optimization–solution
  2. Production postmortem (37):
    29 Apr 2022 - Deduplicating replication speed
  3. Recording (3):
    11 Apr 2022 - Clean Architecture with RavenDB
  4. Answer (10):
    07 Apr 2022 - Why is this code broken?
  5. Request for comments (2):
    10 Mar 2022 - Removing graph queries from RavenDB
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats