Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

, @ Q j

Posts: 6,887 | Comments: 49,274

filter by tags archive
time to read 2 min | 256 words

RavenDB is highly concurrent distributed database. That means that we take the idea of race conditions, multiple that by network hiccups and then raise to the power of hair pulling. Now, we have architectural structure to help with a lot of that, but sometimes you need to write and verify what happens when a particular sequence of events in a five node cluster happens. For fun, you may need to orchestrate a particular order of operations across multiple disparate processes (sometimes on different machines). As you can imagine, that is… challenging.

I wanted to give you a hint of some of the techniques that we use to handle this. We have code that looks like this, sprinkled throughout our code base (Rachis is the name of our Raft cluster implementation):

This is where a leader connects to a follower to setup their relationship:

image

This is called during leader election:

image

These methods are implemented in the following manner:

image

In other words, they will set a ManualResetEvent that we setup as part of our testing infrastructure. The code isn’t even being run on production release, but it allow us to very carefully structure the exact sequence of events that we need to expose specific behaviors in the system.

time to read 5 min | 968 words

I run across the follow on Twitter:

And that resonated very strongly with me, but from the other side. I actually talked about it quite a lot in the past. You should design your system so it can adapt more easily for changes in the business process.

About 15 years ago I was tasked with building a system for scheduling and managing at-home nursing aid. One of the key reasons for the system I was building was that management wanted to enforce their vision of how things should be on the organization and have chosen to do that by ensuring the new system will only allow things to happen That Way.

To my knowledge, they spent over three years speccing the system, but after looking at what we delivered (according to the spec, mind you), the people who were supposed to be using that revolted. I ended up working closely with a couple of employees that were supervisors in a local branch and would actually be using the system day in and day out. That project and others like it has taught me a lot about how to design a system that would enable rather than limit what you can do.

Let’s take a paper based system. A client comes in and wants to order something. They have no last name (Cher, Madonna, etc). The person handle the order leave that field empty and process the order. However, in a computerized system, that is going to fail validation and require you to reject the order. The person handling the order? Nothing they can do about it “the system won’t let me”. Admittedly, this is the simplest case that I can think of, but there are many cases where you have to jump through illogical hops to satisfy inflexible rules in the system (leave a comment with the most egregious example you have run into, please).

If you approach this properly, there are relatively simple solution. Implement a human level decision making in the process, you can do that by developing a human level AI or by grabbing a human. Let’s take a simple example from the past few months.

We just had the launch of RavenDB Cloud, during which we focused primarily on the backend operations, getting everything setup properly, monitoring, etc. At the same time, I have the sales team talking to customers. One of the things that kept popping up is that they wanted and needed a pretty flexible pricing system. For example, right now we have a pricing model for: On Demand, Yearly Contract and Upfront Yearly Contract. We are talking to customers that want 3 and 5 years contracts, but that isn’t implemented in the system. I have customers that want to pay by Credit Card (the main supported way), but others can only pay via a Purchase Order and some (still!) want to cut me a physical check.

There are a few ways you can handle something like this:

  • Tell customers that it is a credit card or the highway. Simplest to implement, kinda harsh on the bottom line.
  • Tell customers that of course we can do that, and then go to the dev team and press the Big Red Sales Button. I was the guy who responded the Big Red Sales Button, and I hated that.
  • Tell customers that we can handle their payment needs, go to the system and enable this:
    Manual intervention before charge

When this is checked, the usual payment processing will run, but before we actually charge the user, we insert a human in the loop. This allows us to apply any specific customizations to the payment. For example, one scenario would be to zero the amount remaining to be charged and send a Purchase Order to the finance department for that particular customer.

The key here is to take Jimmy’s notion, of looking into the actual usage of the system and not consider this to be a failing of the system. The system, when created, was probably pretty good, but it changed over time. And even if you re-create it perfectly today, it is going to change again tomorrow, necessitating the same workarounds. If you embrace this from the get go, you end up in a different place. Because now you can employ smarts in the system without having to deploy a new version.

In another case, we had a way for the user to say: “I want to circumvent these checks and proceed anyway”. We accepted the change, even if they violated some rules of the system. We would raise a flag and have another person review and authorize those changes. If the original user wasn’t supposed to use this, there was a… discussion on that and changes implemented in the policy. No need to go through the software for such things, and infinitely more flexible. We had a scenario where one of the integration points failed (it was down for a couple of weeks, IIRC), which was required step for processing an order.

The users could skip this step, proceed normally and then come back later, when the 3rd party integration was available again and do any fixes required. Compared that to an emergency change in production and then frantic fixes afterward when the system is up again. Instead, we have a baked process to handle outliers.

In other words, don’t try to make your humans into computers. As any techie will tell you, computers will do what you told them to. Humans will do what you meant*.

* Well, usually.

time to read 4 min | 749 words

Consider a business that needs to manage leasing apartments to tenants. One of the more important aspects of the business is tracking how much money is due. Because of the highly regulated nature of leasing, there are several interesting requirements that pop up.

The current issue is how do you tackle the baseline for eviction. Let’s say that the region that the business is operating under has the following minimum requirements for eviction:

  • Total unpaid debt (30 days from invoice) that is greater than $2,000.
  • Total overdue debt (30 – 60 days from invoice) that is greater than $1,000.
  • Total overdue debt (greater than 60 days from invoice) that is greater than $500.

I’m using the leasing concept here because it is easy to understand that the date ranges themselves are dynamic. We don’t want to wait for the next first of the month to see the changes.

The idea is that we want to be able to show a grid like this:

image

The property manager can then take action based on this data. And here is the raw data that we are working on:

image

It’s easy to see that this customer still has a balance of $175. Note that this balance is as of the July 9th, because we apply payments to the oldest invoice we have. The question now becomes, how can we turn this raw data into the table above?

This turn out to be a bit hard for RavenDB, because it is optimize to answer your queries fast, which means that having to do recalculation on each query (based on the current dates) is not easy. I already shown how to do this kind of task easily enough when we are looking at a single customer. The problem is that we want to have an overall view of the system, not just on a single customer. And ideally without it costing too much.

The key observation to handle this efficiently is RavenDB is to understand that we don’t need to actually generate the table above directly. We just need to get the data to a point where it is trivial to do so. After some thinking, I came up with the following desired output:

image

There idea here is that we are going to give both overall view on the customer’s account as well as details about its outstanding debts. The important detail that we need to understand is that this customer status is unlikely to grow too big. We aren’t likely to see customers that have debts that spans many years, so the size of this document is naturally bounded. The cost of going from this output to the table above is negligible and the process of doing so is obvious. So the only question now is how do we do this?

We are going to utilize RavenDB’s multi-map/reduce to the fullest here. Let’s first look at the maps:

There isn’t really anything interesting here. We are just outputting the data that we need for the second, more interesting stage, the reduce:

There is a whole bunch of stuff going on here, but leave aside how much JavaScript scares me, let’s dig into this.

The important parameters we have here are:

  • Debt
  • CreditBalance
  • RemainingBalance

We compute the CreditBalance by summing all the outstanding payments for the customer. We then gather up all the debts for the customer and sort them by date ascending. The next stage is to apply the outstanding credits toward each of the debts, erasing them from the list if they have been completely paid off. Along the way, we compute the overall remaining balance as well.

And that is pretty much it. It is important to understand that this code is recursive. In other words, if we have a customer that has a lot of invoices and receipts, we aren’t going to be computing this in one go over everything. Instead, we’ll compute this incrementally, over subsets of the data, applying the reduce function as we go.

Queries on this index are going to be fast, and applying new invoices and receipts is going to require very little effort. You can now also do the usual things you do with indexes. For example, sorting the customers by their outstanding balance or total lifetime value.

time to read 2 min | 333 words

RavenDB is really great in aggregations. Even if you have a stupendous amount of information, it will do really well in crunching through the data and summarizing it for you. This is due to the way RavenDB implements map/reduce operations, it allows us to instantly give you aggregation results, regardless of data size. However, this approach requires that you’ll tell RavenDB up front how you want to do the aggregation. This allows RavenDB to do the work ahead of time, as you modify the data, instead of each time you query for it.

A question was raised in the mailing list. How can you use RavenDB to do arbitrary ranges during aggregation. For example, let’s say that we want to be able to look at the total charges for a customer, but slice it so we’ll have:

  • Active – 7 days back
  • Recent  – 7 – 21 days back
  • History – 21 – 60 days back

And each user can define they own time period for Active / Recent / History ranges. That complicate things somewhat, but luckily, RavenDB has multiple ways to solve this issue. First, let’s create the appropriate index for this.

This isn’t really anything special. It will simply aggregate the data by company and by date. That isn’t enough for what we want to query. For that, we need to reach for another tool in our belt, facets.

Here is the relevant query:

image

And here is what the output looks like:

image

The idea is that we have two stages for the process. First, we use the map/reduce index to pre-aggregate the data at a daily level. Then we use facets to roll up the information to the level we desire. Instead of having to go through the raw data, we can operate on the partially aggregated data and dramatically reduce the overall cost.

time to read 1 min | 180 words

We were asked about best practices for managing the RavenDB session (unit of work) in a .NET Core MVC application. I thought it is interesting enough to warrant its own post.

RavenDB’s client API is divided into the Document Store, which holds the overall configuration required to access a RavenDB and the Document Session, which is a short lived object implementing Unit of Work pattern and typically only used for a single request.

We’ll start by adding the RavenDB configuration to the appsettings.json file, like so:

We bind it to the following strongly typed configuration class:

The last thing to do it to register this with the container for dependency injection purposes:

We register both the Document Store and the Document Session in the container, but note that the session is registered on a scoped mode, so each connection will get a new session.

Finally, let’s make actual use of the session in a controller:

Note that we used to recommend having SaveChangesAsync run for you automatically, but at this time, I think it is probably better to do this explicitly.

time to read 6 min | 1051 words

A process running on your system is typically a black box. You don’t have a lot of insight into what is going on inside it. Oh, there are all sort of tools you can use to infer things out (looking at system calls, memory consumption, network connections, etc), but by default… it is a mystery.

RavenDB is a database. It is meant to run unattended for long durations and is designed to mostly run itself. That means that when you look at it, you want to be able to figure out exactly what is going on with the system as soon as possible. To that end, we have included a lot of features inside RavenDB that expose the internal state of the system. From tracing each I/O and its duration to providing detailed statistics about costs and amount of effort invested in various tasks.

These features are invaluable to figure out exactly what is going on in RavenDB at a particular point in time. Of course, nothing beats the ability to open a debugger and inspect the state of the system. But that is something that you can only really do on development. It is not something that can be done on production, obviously. Or can it?

Since RavenDB 3.0, we actually had just this feature, being able to ask RavenDB to capture and display its own state in a format that should be very familiar to developers. When we created RavenDB 4.0, we were able to carry on this feature on Windows (at some cost), but it was a complete non starter on Linux.

On Windows, a process can debug another process if they belong to the same user (somewhat of an over simplification, but good enough). On Linux, the situation is a lot more complex. A process can usually only debug another process if the debugger is running as root or is the parent process of the debugee process.

Another complication was that we are using ClrMD, a wonderful library that allow us to introspect live processes (among many other things). It did not have support for Linux, until about a month ago… as soon as we had the most basic of support there, we jumped into action, seeing how we can bring this feature to Linux as well. A lot of our users are running production systems on Linux, and the ability to look at the system and go: “Hmm. I wonder what this is doing” and then being able to tell is something that we consider a major boost to RavenDB.

It took a lot of fighting and learning a lot more about how debugging permissions work on Linux than I ever wanted to know. But we got it working (details below). You can see how this looks like on a live Linux server:

image

As you can see, there is an indexing thread here doing some work on spatial data. We are going to enhance this view further with the ability to see CPU times as well as job names. The idea is that this is something that you will look at and get enough insight to not need to check the logs or try to infer what is going on. You could just tell.

Now, for the gory details of how this works. We changed the implementation on both Windows and Linux to use passive attach to process, which is much faster. The first thing we tried, once we moved to passive attach is to debug ourselves.

This is a nice enough feature, and quite elegant. We debug ourselves, pull the stack traces and display the data. Unfortunately, this doesn’t work on Linux. A process cannot debug itself. All debugging in Linux is based on the ptrace() system call, and the permissions to that are as specified. I can’t imagine the security implications of letting a process debug itself. After all, it is already can do anything the process can do, because it is the process. But I guess that this is an esoteric enough scenario that no one noticed and then the reaction was, use a workaround.

The usual workaround is to have a process that would spawn RavenDB and then it would be able to debug it. That is… possible, but it would be a major shift in how we deploy, not something that I wanted to do. There is also the ptrace_scope flag, which is supposed to control this behavior. In my tests, at least, disabling the security checks via this flag did absolutely nothing.

Running as root worked, just fine, of course. And then the process crashed. On Linux, when trying to debug your own process, there seems to be an interesting interaction between the debugger and debuggee if an exception is thrown. To the point where it will corrupt the CoreCLR state and kill the process. That was a fun bug to trace, sort of. Linux has a escape hatch in the form of PR_SET_PTRACER option that can be used. However, you can’t designate your own process, unfortunately. That combined with the hard crashes, made self debugging a non starter.

But I still want this feature, and without changing too much about how we are doing things.

Here is what we ended up doing. We have a separate process just to capture the stack trace. When you ask RavenDB to get its stack trace, it will spawn this process, but ask it to wait. It will then grant the new process the permissions necessary to debug RavenDB and signal it to continue. At this point, the debugger child process will capture the stack trace and send it back to RavenDB. RavenDB will reset the permission, enhance the stack trace with additional information that we can provide from inside the process and display it to the user.

The actual debugger process is also marked with setcap to provide it additional permissions it needs. This separation means that we isolate these permissions to a single purpose tool that can be invoked and closed, without increasing the attack surface of RavenDB.

The end result is that you can walk to a production RavenDB server, running on Windows or Linux, and get better information than if you just attached to it with the debugger.

time to read 8 min | 1502 words

One of the silent features of people moving to the cloud is that it make it the relation between performance and $$$ costs evident. In your on data center, it is easy to lose track between the yearly DC budget and the application performance. But on the cloud, you can track it quite easily.

The immediately lead people to try to optimize their costs and use the minimal amount of resources that they can get away with. This is great for people who believe in efficient software.

One of the results of the focus on price and performance has been the introduction of burstable cloud instances. These instances allow you to host machines that do not need the full machine resources to run effectively. There are many systems whose normal runtime cost is minimal, with only occasional spikes. In AWS, you have the T series and Azure has the B series. Given how often RavenDB is deployed on the cloud, it shouldn’t surprise you that we are frequently being run on such instances. The cost savings can be quite attractive, around 15% in most cases. And in the case of small instances, that can be even more impressive.

RavenDB can run quite nicely on a Raspberry PI, or a resource starved container, and for many workloads, that make sense. Looking at AWS in this case, consider the t3a.medium instance with 2 cores and 4 GB at 27.4$ / month vs. a1.large (the smallest non burstable instance) with the same spec (but ARM machine) at 37.2$ per month. For that matter, a t3a.small with 2 cores and 2 GB of memory is 13.7$. As you can see, the cost savings adds up, and it make a lot of sense to want to use the most cost effective solution.

Enter the problem with burstable instances. They are bursty. That means that you have two options when you need more resources. Either you end up with throttling (reducing the amount of CPU that you can use) or you are being charged for the additional extra CPU power you used.

Our own production systems, running all of our sites and backend systems are running on a cluster of 3 t3.large instances. As I mentioned, the cost savings are significant. But what happens when you go above the limit? In our production systems, for the past 6 months, we have never had an instance where RavenDB used more than the provided burstable performance. It helps that the burstable machines allows us to accrue CPU time when we aren’t using it, but overall it means that we are doing a good job of  handling requests efficiently. Here are some metrics from one of the nodes in the cluster during a somewhat slow period (the range is usually 20 – 200 requests / sec).

image

So we are pretty good in terms of latency and resource utilizations, that’s great.

However, the immediately response to seeing that we aren’t hitting the system limits is… to reduce the machine size again, to pay even less. There is a lot of material on cost optimization in the cloud that you can read, but that isn’t the point of this post. One of the more interesting choices you can make with burstable instances is to ask to not go over the limit. If your system is using too much CPU, just take it away until it is done. Some workloads are just fine for this, because there is no urgency. Some workloads, such as servicing requests, are less kind to this sort of setup. Regardless, this is a common enough deployment model that we have to take it into account.

Let’s consider the worst case scenario from our perspective. We have a user that runs the cheapest possible instance a t3a.nano with 2 cores and 512 MB of RAM costing just 3.4$ a month. The caveat with this instance is that you have just 6 CPU credits / hour to your name. A CPU credit is basically 100% CPU for 1 minute. Another way of looking at this is that t t3a.nano instance has a budget of 360 CPU seconds per hour. If it uses more than that, it is charged a hefty fee (about ten times per hour than the machine cost). So we have users that disable the ability to go over the budget.

Now, let’s consider what is going to happen to such an instance when it hits an unexpected load. In the context of RavenDB, it can be that you created a new index on a big collection. But something seemingly simple such as generating a client certificate can also have a devastating impact on such an instance.RavenDB generates client certificates with 4096 bits keys. On my machine (nice powerful dev machine), that can take anything from 300 – 900 ms of CPU time and cause a noticeable spike in one of the cores:

image

On the nano machine, we have measured key creation time of over six minutes.

The answer isn’t that my machine is 800 times more powerful than the cloud machine. The problem is that this takes enough CPU time to consume all the available credits, which cause throttling. At this point, we are in a sad state. Any additional CPU credits (and time) that we earn goes directly to the already existing computation. That means that any other CPU time is pushed back. Way back. This happens at a level below the operating system (at the hypervisor), so there it isn’t even aware of it. What is happening from the point of view of the OS is that the CPU is suddenly much slower.

All of this make sense and expected given the burstable nature of the system. But what is the observed behavior?

Well, the instance appears to be frozen. All the cloud metrics will tell you that everything is green, but you can’t access the system (no CPU to accept new connections) you can’t SSH into it (same) and if you have an existing SSH connection, it appears to have frozen. Measuring performance from inside the machine shows that everything is cool. One CPU is busy (expected, generating the certificate, doing indexing, etc). Another is idle. But the system behaves as if it has no CPU available, which is exactly what is going on, except in a way that we can’t tell.

RavenDB goes to a lot of trouble to be a nice citizen and a good neighbor. This includes scheduling operations so the underlying OS will always have the required resources to handle additional load. For example, background tasks are run with lowered priority, etc. But when running on a burstable machine that is being throttled… well, from the outside it looked like certain trivial operations would just take the entire machine down and it wouldn’t be recoverable short of hard reboot.

Even when you know what is going on, it doesn’t really help you. Because from inside the machine, there is no way to access the cloud metrics in a good enough precision to take action.

We have a pretty strong desire to not get into these kind of situation, so we implemented what I can only refer to as counter measures. When you are running on a burstable instance, you can let RavenDB know what is your CPU credits situation, at which point RavenDB will actively monitor the machine and compute its own tally of the outstanding CPU credits situation. When we notice that the CPU credits are running short, we’ll start pro-actively halting internal processes to give the machine more space to recover from the CPU credits shortage.

We’ll move ETL processes and ongoing backups to other nodes in the cluster and even pause indexing in order to recover the CPU time we need. Here is an example of what you’ll see in such a case:

image

One of the nice things about the kind of promises RavenDB make about its indexes is that it is able to make these kind of decisions without impacting the guarantees that we provide.

If the situation is still bad, we’ll start proactively rejecting requests. The idea is that instead of getting of getting to the point where we are being throttled (and effectively down to the world), we’ll process a request by simply sending 503 Service Unavailable response back. This is going to be very cheap to do, so won’t put additional strain on our limited budget. At the same time, the RavenDB’s client treat this kind of error as a trigger for a failover and will use another node to service this request.

This was a pretty long post to explain that RavenDB is going to give you a much nicer experience on burstable machines even if you don’t have bursting capabilities.

time to read 5 min | 960 words

imageWhat happens when you want to to page through result sets of a query while the underlying data set is being constantly modified?

This seems like a tough problem, and one that you wouldn’t expect to encounter very often, right? But a really good example of just this issue is the notion of a feed in a social network. To take Twitter for simplicity, you have many people generating twits, and other users are browsing through their timeline.

What ends up happening is that the user browsing through their timeline is actually trying to page through the results, but at the same time, you are going to get more updates to the timeline while you are reading it. One of the key requirements that we have here, then, is that we want to be sure that we aren’t actually creating a lot of jitter for the user as they scroll through the timeline. Luckily, because this is a hard problem, the users are already quite familiar with some of the side affects. It would surprise no one to see the same twit multiple times in the timeline. It can be because of a retweet or a like by a user you follow, or it can be a result of the way paging is done.

Now that we understand what we want to achieve, let’s see how you can try getting there. The simplest way to handle this is to ask the database for some sort of a stable reference for the query. So instead of executing the query and being done with it, you’ll have the server maintain it for a period of time and send you the pages as you need them. This is simple, easy to implement and costly in terms of system resources. You’ll need to keep the query results in memory for each one of your users, and that can be quite a lot of memory to keep around just in case. Especially given the different between human’s interaction times and the speed of modern systems.

Another way to do that is to ask the database engine to generate a way to re-create the query as it was at that time. This is sometimes called a continuation token or some such. That works great, usually, but come with its own complications. For example, imagine that I’m doing this on the following query:

from Users order by LastName limit 5

Which gives us the following result:

image

And I got the first five users, and now I want to get the next five. Between the first and second query, a user whose last name is “Aardvark” was inserted in the system. At this point, what would you expect to get from the query? We have two choices here, as you can see below:

image

The problem is that from my perspective, both of those has serious issues. To start with, to compute the results shown in orange, you’ll need to jump through some serious hoops on the backend, and the result looks strange. To get the results in green is quite easy, but it will mean that you missed out on seeing Aardvark.

You might have noticed that the key issue here isn’t so much with the way we build the query, but the order in which we need to traverse it. We ask to sort it by last name, but we also want to get results as they come by. As it turns out, the process becomes a whole lot simpler if we unify these concepts. If we issue the following query, for example, our complexity threshold drops by a lot.

from Messages order by CreatedAt desc limit 5

This is because we now get the following results:

image

And paging through this now is pretty easy, if we want to page down, we can now issue the query to get the next page of data:

from Messages order by CreatedAt desc where CreatedAt < 217 limit 5

By making sure that we are paging and filtering on the same property, we can easily scroll through the results without having to do too much work, either in the application or in our database backend. We can also query if there are new stuff that were missed by checking CreatedAt > 222, of course.

But there is one wrinkle here. I intentionally used the CreateAt field but put numeric values there. Did you notice that there was no 220 value? That one was created on an isolated node and didn’t arrive yet. When it will show up in the local database, we’ll need to decide if we’ll give it a new value (making sure it will show up in the timeline) or store it as is, meaning that it might get lost.

These type of questions are probably more relevant at the business level. You might want to apply different behaviors based on how many likes a twit has, for example.

Another option is to have an UpdatedAt field as well, which can allow you to quickly answer the question: “What items in the range I scan has changed?”. This method also allows for simpler model for handling updates, but much of that depends on the kind of behavior you want to get. This method handles updates, including updates to the parts seen and unseen in a reasonable way and predictable cost.

time to read 7 min | 1301 words

Last week I posted about some timeseries work that we have been doing with RavenDB. But I haven’t actually talked about the feature in this space before, so I thought that this would be a good time to present what we want to build.

The basic idea with timeseries is that this is a set of data points taken over time. We usually don’t care that much about an individual data point but care a lot about their aggregation. Common usages for time series include:

  • Heart beats per minute
  • CPU utilization
  • Central back interest rate
  • Disk I/O rate
  • Height of ocean tide
  • Location tracking for a vehicle
  • USD / Bitcoin closing price

As you can see, the list of stuff that you might want to apply this to is quite diverse. In a world that keep getting more and more IoT devices, timeseries storing sensor data are becoming increasingly common. We looked into quite a few timeseries databases to figure out what needs they serve when we set out to design and build timeseries support to RavenDB.

RavenDB is a document database, and we envision timeseries support as something that you use at the document boundary. A good example of that would the heartrate example. Each person has their own timeseries that record their own heartrate over time. In RavenDB, you would model this as a document for each person, and a heartrate timeseries on each document.

Here is how you would add a data point to my Heartrate’s timeseries:

image

I intentionally starts from the Client API, because it allow me to show off several things at once.

  1. Appending a value to a timeseries doesn’t require us to create it upfront. It will be created automatically on first use.
  2. We use UTC date times for consistency and the timestamps have millisecond precision.
  3. We are able to record a tag (the source for this measurement) on a particular timestamp.
  4. The timeseries will accept an array of values for a single timestamp.

Each one of those items is quite important to the design of RavenDB timeseries, so let’s address them in order.

The first thing to address is that we don’t need to create timeseries ahead of time. Doing so will introduce a level of schema to the database, which is something that we want to avoid. We want to allow the user complete freedom and minimum of fuss when they are building features on top of timeseries. That does lead to some complications on our end. We need to be ab le to support timeseries merging. Allowing you to append values on multiple machines and merging them together into a coherent whole.

Given the nature of timeseries, we don’t expect to see conflicting values. While you might see the same values come in multiple times, we assume that in that case you’ll likely just get the same values for the same timestamps (duplicate writes). In the case of different writes on different machines with different values for the same timestamp, we’ll arbitrarily select the largest of those values and proceed.

Another implication of this behavior is that we need to handle out of band updates. Typically in timeseries, you’ll record values in increasing date order. We need to be able to accept values out of order. This turns out to be pretty useful in general, not just for being able to handle values from multiple sources, but also because it is possible that you’ll need to load archived data to already existing timeseries.  The rule that guided us here was that we wanted to allow the user as much flexibility as possible and we’ll handle any resulting complexity.

The second topic to deal with is the time zone and precision. Given the overall complexity of time zones in general, we decided that we don’t want to deal with any of that and want to store the times in UTC only. That allows you to work properly with timestamps taken from different locations, for example. Given the expected usage scenarios for this feature, we also decided to support millisecond precision. We looked at supporting only second level of precision, but that was far too limiting. At the same time, supporting lower resolution than millisecond would result in much lower storage density for most situations and is very rarely useful.

Using DateTime.UtcNow, for example, we get a resolution on 0.5 – 15 ms, so trying to represent time to a lower resolution isn’t really going to give us anything. Other platforms have similar constraints, which added to the consideration of only capturing the time to millisecond granularity.

The third item on the list may be the most surprising one. RavenDB allows you to tag individual timestamps in the timeseries with a value. This gives you the ability to record metadata about the value. For example, you may want to use this to record the type of instrument that supplied the value. In the code above, you can see that this is a value that I got from a FitBit watch. I’m going to assign it lower confidence value than a value that I got from an actual medical device, even if both of those values are going to go on the same timeseries.

We expect that the number of unique tags for values in a given time period is going to be small, and optimize accordingly. Because of the number of weasel words in the last sentence, I feel that I must clarify. A given time period is usually in the order of an hour to a few days, depending on the number of values and their frequencies. And what matters isn’t so much the number of values with a tag, but the number of unique tags. We can very efficiently store tags that we have already seen, but having each value tagged with a different tag is not something that we designed the system for.

You can also see that the tag that we have provided looks like a document id. This is not accidental. We expect you to store a document id there, and use the document itself to store details about the value. For example, if the type of the device that captured the value is medical grade or just a hobbyist. You’ll be able to filter by the tag as well as filter by the related tag document’s properties. But I’ll show that when I’ll post about queries, in a different post.

The final item on the list that I want to discuss in this post is the fact that a timestamp may contain multiple values. There are actually quite a few use cases for recording multiple values for a single timestamp:

  • Longitude and latitude GPS coordinates
  • Bitcoin value against USD, EUR, YEN
  • Systolic and diastolic reading for blood pressure

In each cases, we have multiple values to store for a single measurement. You can make the case that the Bitcoin vs. Currencies may be store as stand alone timeseries, but GPS coordinates and blood pressure both produce two values that are not meaningful on their own. RavenDB handles this scenario by allowing you to store multiple values per timestamp. Including support for each timestamp coming with a separate number of values. Again, we are trying to make it as easy as possible to use this feature.

The number of values per timestamp is going to be limited to 16 or 32, we haven’t made a final decision here. Regardless of the actual maximum size, we don’t expect to have more than a few of those values per timestamp in a single timeseries.

Then again, the point of this post is to get you to consider this feature in your own scenarios and provide feedback about the kind of usage you want to have for this feature. So please, let us know what you think.

time to read 1 min | 81 words

Yesterday I talked about the design of the security system of RavenDB. Today I re-read one of my favorite papers ever about the topic.

This World of Ours by James Mickens

This is both one of the most hilarious paper I ever read (I had someone check up on me when I was reading that, because of suspicious noises coming from my office) and a great insight into threat modeling and the kind of operating environment that your system will run at.

FUTURE POSTS

  1. Inside RavenDB now available on RavenDB.Net - 15 hours from now

There are posts all the way to Sep 16, 2019

RECENT SERIES

  1. re (22):
    19 Aug 2019 - The Order of the JSON, AKA–irresponsible assumptions and blind spots
  2. Design exercise (6):
    01 Aug 2019 - Complex data aggregation with RavenDB
  3. Reviewing mimalloc (2):
    22 Jul 2019 - Part II
  4. Production postmortem (26):
    07 Jun 2019 - Printer out of paper and the RavenDB hang
  5. Reviewing Sled (3):
    23 Apr 2019 - Part III
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats