Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 539 words

The Cloud team at RavenDB has been working quite hard recently. The company at large is gearing up for the upcoming 6.2 release, but I can’t ignore the number of goodies that have dropped for RavenDB Cloud Customers.

Large Clusters & Sharding

RavenDB Cloud runs your production cluster with 3 nodes by default. Each one of them operates in a separate availability zone for maximum survivability. The new feature allows you to add additional nodes to your cluster. In the RavenDB Cloud Portal, you can see the “Add node” button and its impact:

Clicking this button allows you to add additional nodes to your cluster. The nodes will be deployed and attached to your cluster within a minute or two. The new nodes will be deployed in the same region (but not necessarily the same availability zone) where your cluster is already deployed.

There are plans in place to add support for deploying nodes in other regions and even in a multi-cloud environment. I would love to hear your feedback on this proposed feature.

You can see the new instances in the RavenDB Studio as well:

The key reason for adding additional nodes to a cluster is when you have very large datasets and you want to shard the data. Here is what this can look like:

In this case, we have sharded the data across 5 nodes, with a replication factor of 2.

Feature selection

There are certain Enterprise features that are only available in the higher-end instances in RavenDB Cloud (typically P30 or higher). We now allow you to selectively enable these features even on lower-tier instances.

This feature allows you to easily pick & choose (on an a-la-carte basis) the specific features you want, without having to upgrade to the more expensive tiers.

Metrics & monitoring

This feature isn’t actually new, but it absolutely deserves your attention. The RavenDB Cloud Portal has a metrics button that you should get familiar with:

Clicking it will provide a wealth of information about your cluster and its behavior. That can be really useful if you want to understand the system’s behavior. Take a peek:

Alerts & Warnings

In addition to just looking at the metrics, the RavenDB Cloud backend will give you some indication about things that you should pay attention to. For example, let’s assume that we had a node failure. You’ll typically not notice that since the RavenDB Cluster & client will work to ensure high availability.

You’ll be able to see that in the metrics, and the RavenDB Cloud Portal will bring it to your attention:

Summary

The major point we strive for in RavenDB and RavenDB Cloud is the notion that the entire experience will be seamless. From deployment and routine management to ensuring that you don’t have to concern yourself with the minutiae of data management, so you can focus on your application.

Being able to develop both the software and its execution environment greatly helps in providing solutions that Just Work. I’m really proud of what we have accomplished and I would love to get your feedback on it.

time to read 6 min | 1075 words

I’m really happy to announce that RavenDB Cloud is now offering NVMe based instances on the Performance tier. In short, that means that you can deploy RavenDB Cloud clusters to handle some truly high workloads.

You can learn more about what is actually going on in our documentation. For performance numbers and costs, feel free to skip to the bottom of this post.

I’m assuming that you may not be familiar with everything that a database needs to run fast, and this feature deserves a full explanation of what is on offer. So here are the full details of what you can now do.

RavenDB is a transactional database that often processes far more data than the memory available on the machine. Consequently, it needs to read from and write to  the disk. In fact, as a database, you can say that it is its primary role. This means that one of the most important factors for database performance is the speed of your disk. I have written about the topic before in more depth, if you are interested in exploring the topic.

When running on-premises, it’s easy to get the best disks you can. We recommend at least good SSDs and prefer NVMe drives for best results. When running on the cloud, the situation is quite different. Machines in the cloud are assumed to be transient, they come and go. Disks, on the other hand, are required to be persistent. So a typical disk on the cloud is actually a remote storage device (typically replicated). That means that disk I/O on the cloud is… slow. To the point where you could get better performance from off-the-shelf hardware from 20 years ago.

There are higher tiers of high-performance disks available in the cloud, of course. If you need them, you are paying quite a lot for that additional performance. There is another option, however. You can use NVMe disks on the cloud as well. Well, you could, but do you want to?

The reason you’d want to use an NVMe disk in the cloud is performance, of course. But the problem with achieving this performance on the cloud is that you lose the safety of “this disk is persistent beyond the machine”. In other words, this is literally a disk that is attached to the physical server hosting your VM. If something goes wrong with that machine, you lose the disk. Traditionally, that means that you can only use that for transient data, not as the backend store for a database.

However, RavenDB has some interesting options to deal with this. RavenDB Cloud runs RavenDB clusters with 3 copies of the data by default, operating in a full multi-master configuration. Given that we already have multiple copies of the data, what would happen if we lost a machine?

The underlying watchdog would realize that something happened and initiate recovery, which will effectively spawn the instance on another node. That works, but what about the data? All of that data is now lost. The design of RavenDB treats that as an acceptable scenario, the cluster would detect such an issue, move the affected node to rehabilitation mode, and start pumping all the data from the other nodes in the cluster to it.

In short, now we’ve shifted from a node failure being catastrophic to having a small bump in the data traffic bill at the end of the month. Thanks to its multi-master setup, RavenDB can recover even if two nodes go down at the same time, as we’ll recover from the third one. RavenDB Cloud runs the nodes in the cluster in separate availability zones specifically to handle such failure scenarios.

We have run into this scenario multiple times, both as part of our testing and as actual production events. I am happy to say that everything works as expected, the failed node comes up empty, is filled by the rest of the cluster, and then seamlessly resumes its work. The users were not even aware that something happened.

Of course, there is always the possibility that the entire region could go down, or that three separate instances in three separate availability zones would fail at the same time. What happens then? That is expected to be a rare scenario, but we are all about covering our edge cases.

In such a scenario, you would need to recover from backup. Clusters using NVMe disks are configured to run using Snapshot backups, which consume slightly more disk space than normal but can be restored more quickly.

RavenDB Cloud also blocks the user's ability to scale up or down such clusters from the portal and requires a support ticket to perform them. This is because special care is needed when performing such operations on NVMe machines. Even with those limitations, we are able to perform such actions with zero downtime for the users.

And after all this story, let’s talk numbers. Take a look at the following table illustrating the costs vs. performance on AWS (us-east-1):

Type# of coresMemoryDiskCost ($ / hour)
P40 (Premium disk)1664 GB2 TB, 10,000 IOPS, 360 MB/s8.790
PN30 (NVMe)864 GB2 TB, 110,000 IOPS, 440 MB/s6.395
PN40 (NVMe)16128 GB4 TB, 220,000 IOPS, 880 MB/s12.782

The situation is even more blatant when looking at the numbers on Azure (eastus):

Type# of coresMemoryDiskCost ($ / hour)
P40 (Premium disk)1664 GB2 TB, 7,500 IOPS, 250 MB/s7.089
PN30 (NVMe)864 GB2 TB, 400,000 IOPS, 2 GB/s6.485
PN40 (NVMe)16128 GB4 TB, 800,000 IOPS, 4 GB/s12.956

In other words, you can upgrade to the NVMe cluster and actually reduce the spend if you are stalled on I/O. We expect most workloads to see both higher performance and lower cost from a move from P40 with premium disk to PN30 (same amount of memory, fewer cores). For most workloads, we have found that IO rate matters even more than core count or CPU horsepower.

I’m really excited about this new feature, not only because it can give you a big performance boost but also because it leverages the same behavior that RavenDB already exhibits for handling issues in the cluster and recovering from unexpected failures.

In short, you now have some new capabilities at your fingertips, being able to use RavenDB Cloud for even more demanding workloads. Give it a try, I hear it goes vrooom 🙂.

time to read 14 min | 2727 words

Fungible is a funny word, mostly because you are most likely familiar with the term from NFT (non-fungible tokens) and other similar scams. At its core, it is the idea that for certain things, the instance doesn’t matter, just the amount.

The classic example is that if I lend you a 50$ bill, and you give me back two 20$ bills and a 10$ bill, you’ve still given me back my money. That is even though you very clearly didn’t. I didn’t get the same physical 50$ paper bill back, I got bills for that same amount. On the other hand, if I give you my dog for the weekend, I would be quite upset if I got back three different dogs, even if the total weight is the same.

This is actually a lot more than I want to know about fungibility, to be honest. But it turns out that if you are running a cloud business or just use the cloud in general, you have to be well-versed in the matter. Because in the cloud, money isn’t fungible. In fact, it doesn’t behave a lot like money at all.

Let’s assume that we are a cloud company called cloud.example.com, offering VPS for ourr users. You are in charge of writing the billing code, and it is pretty simple, right? Here is some code that can compute the charges:


function compute_charges(custId, start, end) {
  let total = 0;
  let predicate = instance =>
    (instance.custId === custId  && instance.started < end) &&
    (instance.ended > start      || instance.ended == null);


  for (let instance of query_instances(predicate)) {
    total += instance.hours_running(start, end) *
             instance.price_per_hour;
  }


  return total;
}

As you can see, there isn’t much there. We find all the instances that were running in the billing period and then calculate the total hours they ran during that period. Please note, this is a simplified model as we aren’t dealing with stopping & starting instances, etc.

The output of the compute_charges() function is a number, which will presumably be handed over to be charged over a credit card. There are other things that we need to do as well (generate an invoice, have a usage report, etc), but I want to focus on the money issue here.

The simplest model is that at the end of the billing period, we charge the customer (using a credit card, for example) and receive our payment. Everyone is happy and we can go home, hopefully richer.

The challenge arises when we want to offer additional options to the customer. For example, we may be willing to give the customer a discount if they are going to commit to a minimum amount of money they’ll spend each month. We may want to offer them upfront payment options or give monetary incentives to a particular aspect of the business (run on ARM instances instead of X64, for example).

Each time that we make such an offer, we are going to be turning around and (significantly) complicating the way we bill the customer. Let’s talk about something as simple as committing to run an instance for a whole year. No upfront payment, just a commitment to pay for a particular server for a year. In AWS or Azure, that would be Reserved Instances, so you are likely very familiar with the idea.

How is that going to be expressed in code? Probably something like this:


function compute_charges(custId, start, end) {
    let total = 0;
    let predicate = instance => /*..redacted.*/;
   
    var hrsPerIns = {};
    for (let i of this.instances(predicate)) {
        let hours = i.hours_running(start, end);
        hrsPerIns[i.type] = hours + (hrsPerIns[i.type] || 0);
        total += hours * i.price_per_hour;
    }


    for (let c of this.commitmentsFor(custId, start, end)) {
        let hours = c.committed_time(start, end);
        let hoursUsed = hrsPerIns[c.type] || 0;
        let unusedCommittedHours = Math.max(0, hours - hoursUsed);
        total += unusedCommittedHours *
                this.instance(c.type).price_per_hour;
    }
 
    return total;
  }

To be clear, the code above is not a good way to handle such a task, but it does show in a pretty succinct way the hidden complexities. In this case, if you didn’t meet your commitment, we’ll charge you for the unused commitment as well.

A more complex system would have to account for discounted rates while using the committed values, for example. And in that case, the priority of applying such rates between different matching commitments.

Other aspects may be giving the user a discount for a particular level of usage. So the first 100GB are priced differently from the rest, applying a free tier and… you get the point, I think. It gets complex.

Note that at this point, we aren’t even talking about money yet, we are discussing computing the charges. The situation is more interesting when we move to the next stage. On the face of it, this seems pretty simple, all you need to do is charge the credit card, no?

Okay, maybe you need to send an invoice, but that is about it, right?

Well… what happens if the customer made an upfront payment for one of those commitments? Or just accidentally paid twice last month and now has credit on your system.

I’m going to leave aside the whole complexity around payments bouncing (which is a whole other interesting topic) and how to deal with the actual charging. Right now I want to focus on the nature of money itself.

Imagine you have a commitment with a customer for an 8-core / 64 GB VPS server for a whole year. And they paid upfront, getting a nice discount along the way. How would you record that in your system?

The easiest is to create the notion of credit for the user, which you deduct whenever you need to charge them. So we’ll first compute the charges, then deduct the existing credits, and debit the customer if anything remains. This is simple, easy to work with, and wrong.

Remember that discount the user received? They paid for that particular VPS type, and if you now need to charge them for anything else (such as storage charges), that money cannot come from the funds paid for the VPS.

In other words, the money the customer paid is not fungible. It isn’t applicable for any charge, it is colored. It is dedicated to a particular purpose. And managing that turns out to be pretty complex. Mostly because we are trying to fit everything into the debits and credits on the account.

A better model is to avoid using money, in the same way that if you mix inches and centimeters you’ll eventually end up in a bad place on Mars. The solution is to treat each individual charge as its own “currency”.

In other words, when computing the charges, we aren’t trying to find the cost of running a particular instance for the billing period. We are trying to find how many “cost units” we have for that time period.

Instead of getting a single number that we’ll charge the customer, we’ll obtain a detailed set of the changes in question. Not as money, but as cost units. Think about those in a similar way to currency.  Note that all the units are multiples of 730 hours (number of hours per month, on average).


compute_charges(custId, start, end) => {
    custId: 'customers/3291-B',
    start: '2024-01-01', end: '2024-01-31',
    costs: [
      {type: '8Cores-64GB-hours',  qty: 2190},
      {type: '4Cores-32GB-hours',  qty:  730},
      {type: 'disk-5000-iops',     qty: 2920},
    ],
}

The next step after that is to get your allocated budget for the same billing period, which will look something like this:


compute_budget(custId, start, end) => {
    custId: 'customers/3291-B',
    start: '2024-01-01', end: '2024-01-31',
    commitments: [
      {type: '8Cores-64GB-hours',  qty: 2190},
      {type: '4Cores-32GB-hours',  qty: 1460},
      {type: 'disk-5000-iops',     qty: 730},
    ],
}

In other words, just as we compute the charges based on the actual usage for that billing period, we apply the same approach on the commitments we have. The next stage is to just add all of those together. In this case, we’ll end up with the following:

  • 8Cores-64GB-hours ⇒ 0 (we used as much as we committed to)
  • 4Cores-32GB-hours ⇒ -730 (we committed to more than we used)
  • Disk-5000-iops ⇒ 2190 (remaining use after applying commitment, priced as you go)

We aren’t done yet, after commitments, there are other plans that we may need to run. For example, we’ll provide you with some global discounts for VM rental (which doesn’t apply to disks, however). Working at the level of cost units (or colors, or currency, whatever term you like) allows us to apply those things in a very fine-grained manner. More importantly, the end result and all its intermediate steps are very clear. That is quite important when you look at a six-figure bill with hundreds of line items and you want to see whether the billing matches your contract or not.

As you can imagine, given the inherent complexity of the system, being able to clearly “show your work” is quite important. Especially when there is a misunderstanding or questions are being raised (and there will be).

What we have done now is compute the actual charges based on their type, but we need to convert that to real money. There are several steps along this process:

  1. We need to charge all the active commitments. Those may have been pre-paid (in which case there is no current charge), but they may have a (fixed) monthly cost that we need to add to the current invoice.
  2. We need to perform a “currency conversion” between the units we have and actual money. In the example above, we have a negative number of units (for 4Cores-32GB-hours), as we committed to more hours than we actually used. We are still being charged for this by applying the rate from the commitment.
  3. On the other hand, when we examine the disk costs, we used more than we committed to. Here we need to make a decision about what price we’ll charge the user. It can be the commitment price or the pay-as-you-go price. So even for the same currency we may have different rules.

After all of this is done, we are now left with a final number. The actual amount of money that we need to charge the customer. This is the point at which we check if the customer has any credit already paid in the system or if we need to make an actual charge. That aspect is complicated by whether you are charging a credit card (same for any other automatic billing option) or issuing an invoice to be paid manually.

For a manual invoice, you now have a whole other process. For example, you may offer discounts for the customer if they pay within 14 days versus the usual 30, or charge a fee for paying within 60 days, etc.

I’m not touching on collections or what to do when you fail to charge the customer. It is shockingly common to encounter payment failures. To the point where we never had a single payment run that didn’t include at least several such cases. The reasons range from deal size too big to (temporary) lack of funds to suspicious-seeming activity. You need to be able to handle that as well. But those are topics for another post.

In this post, my aim was to discuss just the issue of the complexity of money in the cloud business. I find the model of treating the charges as separate “currencies” to be a nice one overall, but I would love to hear about other people’s experiences in this matter.

time to read 3 min | 534 words

Originally posted at 12/6/2010

That is actually an inaccurate statement, for the most part. A more accurate way of saying this is that for the most part, you don’t need to think about scaling a multi tenant environment. Most multi tenant environments are using some form of a user based licensing method, which means that for the vast majority of tenants, there aren’t going to be a lot of users. That means that we have a very natural way of isolating things. Handling a single tenant is very easy, then, because we are actually handling small amount of users and (relatively) small amount of information.

The challenge begins when we need to consider how to work with all the tenants. My general approach to that is actually to think upfront about the sort of things that would prevent me from scaling things, and avoid those, but not do anything else about it. Let us consider what I mean…

The easiest way to handle multi tenant application is when you create total separation between each tenant. It is fairly easy to do, all you have to ensure is a separate data store (note that you really need to ensure separate caches, as well), and everything else more or less flows from there. In most multi tenant applications, there are four things that may vary between tenants:

  • Schema
  • Data
  • Behavior
  • User Interface

The first two are handled at the data store level (and is usually very easy), while the last two are just application code (be it separate dll, tenant scripts, configuration, etc).

Given all of that, scaling multi tenant applications is usually a process that is handled by the load balancer. What you need is to specify which tenants are served by which servers, and to ensure that you don’t overload each server capacity. If you want to be really smart about it, you can do load based load balancing, but for the most part, even static routing that gives each server X number of tenants would work.

This helps a lot because it means that for the most part, even though you are writing a distributed app that may have a very large number of total users, your actual application can behave as if it was a small application. That means that you can do thinks like memory based caching, instead of distributed caching, because all the tenant users are going to be served from the same server.

And what happen if you have a tenant with thousands of users?

At that point, doing this sort of trick won’t work, but presumably you are charging them enough to just allocate a single server for this tenant. In fact, assuming standard user based licensing, you actually got to the point where you have, as we say in Israel, a Rich Man’s Problem. Even if they are big enough to require more than a single server, you can usually just split them to several, that is where the pre-planning for the scaling stoppers starts to pay. But in all honestly, it doesn’t happen that often.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}