Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

Posts: 6,965 | Comments: 49,570

filter by tags archive
time to read 2 min | 388 words

I recently had what amounted to a drive by code review. I was looking into code that wasn’t committed or PR. Code that might not have been even saved to disk at the time that I saw it. I saw that while working with the developer on something completely different. And yet even a glace was enough to cause me to pause and make sure that this code will be significantly changed before it ever move forward. The code in question is here:

What is bad about this code? No, it isn’t the missing ConfigureAwait(false), in that scenario we don’t need it. The problem is in the very first line of code.

This is meant to be public API. It will have consumers from outside our team. That means that the very first thing that we need to ensure is that we don’t expose our own domain model to the outside world.

There are multiple reasons for this. To start with, versioning is a concern. Sure, we have the /v1/  in the route, but there is nothing here that would make break if we changed our domain model in a way that a third party client relies on. We have a compiler, we really want to be able to use it.

The second issue, which I consider more important, is that this leaks information that I may not really want to share. By exposing my full domain model to the outside world, I risk quite a bit. For example, I may have internal notes on the support ticket which I don’t want to expose to the public. Any field that I expose to the outside world is a compatibility concern, but any field that I add is a problem as well. This is especially true if I assume that those fields are private.

The fix is something like this:

Note that I have class that explicitly define the shape that I’m giving to the outside world. I also manually map between the internal and external fields. Doing something like auto mapper is not something that I want, because I want all of those decisions to be made explicitly. In particular, I want to be sure that every single field that I share with the outside world is done in such a way that it is visible during PR reviews.

time to read 2 min | 260 words

These are not the droids you are looking for! – Obi-Wan Kenobi

Sometimes you need to find a set of documents not because of their own properties, but based on a related document. A good example may be needing to find all employees that blue Nissan car. Here is the actual model:

image

In SQL, we’ll want a query that goes like this:

This is something that you cannot express directly in RavenDB or RQL. Luckily, you aren’t going to be stuck, RavenDB has a couple of options for this. The first, and the most closely related to the SQL option is to use a graph query. That is how you will typically query over relationships in RavenDB. Here is what this looks like:

Of course, if you have a lot of matches here, you will probably want to do things in a more efficient manner. RavenDB allows you to do so using indexes. Here is what the index looks like:

The advantage here is that you can now query on the index in a very simple manner:

RavenDB will ensure that you get the right results, and changing the Car’s color will automatically update the index’s value.

The choice between these two comes down to frequency of change and how large the work is expected to be. The index favors more upfront work for faster query times while the graph query option is more flexible but requires RavenDB to do more on each query.

time to read 3 min | 568 words

Compression is a nice way to trade off time for space. Sometimes, this is something desirable, especially as you get to the higher tiers of data storage. If your data is mostly archived, you can get significant savings in storage in trade for a bit more CPU. This perfectly reasonable desire create somewhat of a problem for RavenDB, we have competing needs here. On the one hand, you want to compress a lot of documents together, to benefit for duplications between documents. On the other hand, we absolutely must be able to load a single document as fast as possible. That means that just taking 100MB of documents and compressing them in a naïve manner is not going  to work, even if this is going to result in great compression ratio. I have been looking at zstd recently to help solve this issue. 

The key feature for zstd is the ability to train the model on some of the data, and then reuse the resulting dictionary to greatly increase the compression ratio.

Here is the overall idea. Given a set of documents (10MB or so) that we want to compress, we’ll train zstd on the first 16 documents and then reuse the dictionary to compress each of the documents individually. I have used a set of 52MB of JSON documents as the test data. They represent restaurants critics, I think, but I intentionally don’t really care about the data.

Raw data: 52.3 MB. Compressing it all with 7z gives us 1.08 MB. But that means that there is no way to access a single document without decompressing the whole thing.

Using zstd with the compression level of 3, I was able to compress the data to 1.8MB in 118 milliseconds. Choosing compression level 100 reduced the size to 1.02MB but took over 6 seconds to run.

Using zstd on each document independently, where each document is under 1.5 KB in size gave me a total reducing from to 6.8 MB. This is without the dictionary. And the compression took 97 milliseconds.

With a dictionary whose size was set to 64 KB, computed from the first 128 documents gave me a total size of 4.9 MB and took 115 milliseconds.

I should note that the runtime of the compression is variable enough that I’m pretty much going to call all of them the same.

I decided to use this on a different dataset and run this over the current senators dataset. Total data size is 563KB and compressing it as a single unit would give us 54kb. Compressing as individual values, on the other hand, gave us a 324 kb.

When training zstd on the first 16 documents with 4 KB of dictionary to generate we got things down to 105 kb.

I still need to mull over the results, but I find them really quite interesting. Using a dictionary will complicate things, because the time to build the dictionary is non trivial. It can take twice as long to build the dictionary as it would be to compress the data. For example, 16 documents with 4 kb dictionary take 233 milliseconds to build, but only take 138 milliseconds to compress 52 MB. It is also possible for the dictionary to make the compression rate worse, so that is fun.

Any other idea on how we can get both the space savings and the random access option would be greatly appreciated.

time to read 2 min | 383 words

I did a code review recently and pretty much the most frequent suggestion was something along the line of: “This needs to be pushed to the infrastructure”. I was asked to be clearer about this, so I decided to write a blog post about it.

In general, whenever you see a repeating code pattern, you don’t need to start extracting it. What you need to do is to check whatever this code pattern serves a purpose. If it doesn’t serve a purpose, only then it is time to see if we can abstract that to remove duplication. I phrase things in this manner because all too often we see a tendency to immediately want to abstract things out. The truth is that in many cases, trying to abstract things is going to cause things to be less clear down the line. That is why I wanted to call it out first, even when I want to explain how to do the exact thing that I caution you about it.  Resource cleanup in performance sensitive code is a good example of one scenario where you don’t want to put things to the infrastructure, you want everything to be just there. There are other reasons, too.

After all the disclaimers, let’s talk about a concrete example in which we should do something about it.

Error handling is a great case for moving to infrastructure. This code is running inside an MVC Controller, and we can move our error handling from inside each action to the infrastructure, you can read about it here. I’m not sure if this is the most up to date reference for error handling, but that isn’t the point. The exact mechanism that you do it doesn’t matter. The whole idea is that you don’t want to see it. You push it to the infrastructure and then it is handled.

In the same manner, if you need to do logging or auditing, push them down the stack if they are in the form of: “User X accessed Y”. On the other hand, if you need something like: “Manager X authorized N vacations days for Y”, that is a business audit which should be recorded in the business logic, not in the infrastructure.  I wrote about this a lot in the past.

time to read 3 min | 557 words

One of the first features that RavenDB had, from the very first release, was multi document ACID transactions. With RavenDB you could modify multiple documents at the same time and then save them all, knowing that they would be be saved as a single atomic unit. Other NoSQL databases had you jump through fire if you wanted transactional behavior (building your own two phase commit protocol at the application level, not fun). I consider this to be a pretty important feature, obviously. But why?

In Inside RavenDB book, there is the following advice about modeling considerations for documents:

  • Independent, meaning a document should have its own separate existence from any other documents.
  • Isolated, meaning a document can change independently from other documents.
  • Coherent, meaning a document should be legible on its own, without referencing other documents.

In other words, with proper modeling, you shouldn’t need to have multi document transactions. Any transaction should only contain a single document, so that should be enough, no? Why spend all the time and effort on building multi document transactions?

Well, to start with “proper modeling” is a very loaded term. It is great if you can get it, but there are many cases where you need to deviate from it for various reasons. For example, in this blog, we represent a blog post as two documents. One holds the text of the post, the other holds the comments for the post. The reason for this separation is that there are many cases where you want only the blog post, and not the (potentially very many) comments.

In this case, the layout of the document is subject to the physical realities. It is better to split the document to multiple documents based on their purpose. However, if I add a comment to a post, I want to both add it to the Comments document and to increase the NumberOfComments property on the Post document. Doing this in a single transaction means that I don’t have to worry about consistency.

Another good example of wanting to work with multiple documents at the same time is when my documents aren’t using just holding data. Consider the case of accepting a new employee to the company. I need to:

  • Create the new Employee document.
  • Create initial workflow requirements (orientation, employee handbook, tax papers, setup machine / user / vpn access, etc).

In other words, each document here is independent, but they are created together. I want to create the new Employee’s document and at the same time setup a task for the IT to create a new user, allocate a machine, etc. I want to have the employee go through orientation, do all the require paper work, etc.

Each one of those is modeled as a separate document, because they are. See the definition above and consider how they match. But I absolutely don’t want to have a partial state. That I have a new employee, but I didn’t setup payroll for them is a big problem. Having multi document transaction make things a lot simpler.

You can argue that it would be better to model things as a set of related services with independent databases. I’m not sure that I disagree, but this is a much more complex architecture. Not having to go there on day one, while having clear and easy to use consistency guarantees is a major plus in my eyes.

time to read 7 min | 1291 words

I talked about finding a major issue with ThreadLocal and the impact that it had on long lived and large scale production environments. I’m not sure why ThreadLocal<T> is implemented the way it does, but it seems to me that it was never meant to be used with tens of thousands of instances and thousands of threads. Even then, it seems like the GC pauses issue is something that you wouldn’t expect to see by just reading the code. So we had to do better, and this gives me a chance to do something relatively rare. To talk about a complete feature implementation in detail. I don’t usually get to do this, features are usually far too big for me to talk about in real detail.

I’m also interested in feedback on this post. I usually break them into multiple posts in a series, but I wanted to try putting it all in one location. The downside is that it may be too long / detailed for someone to read in one seating. Please let me know your thinking in the matter, it would be very helpful.

Before we get started, let’s talk about the operating environment and what we are trying to achieve:

  1. Running on .NET core.
  2. Need to support tens of thousands of instances (I don’t like it, but fixing that issue is going to take a lot longer).
  3. No shared state between instances.
  4. Cost of the ThreadLocal is related to the number of thread values it has, nothing else.
  5. Should automatically clean up after itself when a thread is closed.
  6. Should automatically clean up after itself when a ThreadLocal instance is disposed.
  7. Can access all the values across all threads.
  8. Play nicely with the GC.

That is quite a list, I have to admit. There are a lot of separate concerns that we have to take into account, but the implementation turned out to be relatively small. First, let’s show the code, and then we can discuss how it answer the requirements.

This shows the LightThreadLocal<T> class, but it is missing the CurrentThreadState, which we’ll discuss in a bit. In terms of the data model, we have a concurrent dictionary, which is indexed by a CurrentThreadState instance which is held in a thread static variable. The code also allows you to define a generator and will create a default value on first access to the thread.

The first design decision is the key for the dictionary, I thought about using Thread.CurrentThread and the thread id.Using the thread id as the key is dangerous, because thread ids may be reused. And that is a case of a nasty^nasty bug. Yes, that is a nasty bug raised to the power of nasty. I can just imagine trying to debug something like that, it would be a nightmare.  As for using Thread.CurrentThread, we’ll not have reused instances, so that is fine, but we do need to keep track of additional information for our purposes, so we can’t just reuse the thread instance. Therefor, we created our own class to keep track of the state.

All instances of a LightThreadLocal are going to share the same thread static value. However, that value is going to be kept as small as possible, it’s only purpose is to allow us to index into the shared dictionary. This means that except for the shared thread static state, we have no interaction between different instances of the LightThreadLocal. That means that if we have a lot of such instances, we use a lot less space and won’t degrade performance over time.

I also implemented an explicit disposal of the values if needed, as well as a finalizer. There is some song and dance around the disposal to make sure it plays nicely with concurrent disposal from a thread (see later), but that is pretty much it.

There really isn’t much to do here, right? Except that the real magic happens in the CurrentThreadState.

Not that much magic, huh? Smile

We keep a list of the LightThreadLocal instance that has registered a value for this thread. And we have a finalizer that will be called once the thread is killed. That will go to all the LightThreadLocal instances that used this thread and remove the values registered for this thread. Note that this may run concurrently with the LightThreadLocal.Dispose, so we have to be a bit careful (the careful bit happens in the LightThreadLocal.Dispose).

There is one thing here that deserve attention, though. The WeakReferenceToLightThreadLocal class, here it is with all its glory:

This is basically wrapper to WeakReference that allow us to get a stable hash value even if the reference has been collected. The reason we use that is that we need to reference the LightThreadLocal from the CurrentThreadState. And if we hold a strong reference, that would prevent the LightThreadLocal instance from being collected. It also means that in terms of the complexity of the object graph, we have only forward references with no cycles, cross references, etc. That should be a fairly simple object graph for the GC to walk through, which is the whole point of what I’m trying to do here.

Oh, we also need to support accessing all the values, but that is so trivial I don’t think I need to talk about it. Each LightThreadLocal has its own concurrent dictionary, and we can just access that Values property and we get the right result.

We aren’t done yet, though. There are still certain things that I didn’t do. For example, if we have a lot of LightThreadLocalinstances, they would gather up in the thread static instances, leading to large memory usage. We want to be able to automatically clean these up when the LightThreadLocalinstance goes away. That turn out to be somewhat of a challenge. There are a few issues here:

  • We can’t do that from the LightThreadLocal.Dispose / finalizer. That would mean that we have to guard against concurrent data access, and that would impact the common path.
  • We don’t want to create a reference from the LightThreadLocal to the CurrentThreadState, that would lead to more complex data structure and may lead to slow GC.

Instead of holding references to the real objects, we introduce two new ones. A local state and a global state:

The global state exists at the level of the LightThreadLocal instance while the local state exists at the level of each thread. The local state is just a number, indicating whatever there are any disposed parents. The global state holds the local state of all the threads that interacted with the given LightThreadLocal instance. By introducing these classes, we break apart the object references. The LightThreadLocal isn’t holding (directly or indirectly) any reference to the CurrentThreadState and the CurrentThreadState only holds a weak reference for the LightThreadLocal.

Finally, we need to actually make use of this state and we do that by calling GlobalState.Dispose() when the LightThreadLocal is disposed. That would mark all the threads that interacted with it as having a disposed parents. Crucially, we don’t need to do anything else there. All the rest happens in the CurrentThreadState, in its own native thread. Here is what this looks like:

Whenever the Register method is called (which happens whenever we use the LightThreadLocal.Value property), we’ll register our own thread with the global state of the LightThreadLocal instance and then check whatever we have been notified of a disposal. If so, we’ll clean our own state in RemoveDisposedParents.

This close down all the loopholes in the usage that I can think about, at least for now.

This is currently going through our testing infrastructure, but I thought it is an interesting feature. Small enough to talk about, but complex enough that there are multiple competing requirements that you have to consider and non trivial aspects to work with.

time to read 4 min | 736 words

Image result for hacker clipartThe 4th fallacy of distributed computing is that the network is secured. It is a fallacy because sooner or later, you’ll realize that the network isn’t secured.

Case in point, Microsoft managed to put 250 million support tickets on the public internet. The underlying issue is actually pretty simple. Microsoft had five Elastic Search instances with no security or authentication.

From the emails that were sent, it seems that they were intend to be secured by separating them from the external networks using firewall rules. A configuration error meant that the firewall rule was no long applicable and they were exposed to the public internet. In this case, at least, I can give better marks than “did you really put a publicly addressable database on the internet in the days of Shodan?”

It isn’t a matter of if you’ll be attacked, it is an issue of when. And according to recent reports, the time it takes from being network accessible to being attacked is under a minute. At worst, it took less than a couple of hours for attacks to start.  If it is accessible, it will be attacked.

So it is was good from Microsoft to make sure that it wasn’t accessible, right? Except that it then became accessible. How much are you willing to bet that there was no monitoring on “these machine is not accessible from the internet”? For that matter, I’m not sure how you can write a monitoring system that check for this. The security assumptions changed, and the systems wasn’t robust to handle that. What is worse, it didn’t fail close. It failed wide open.

The underlying cause of this mess is that the assumption that you can trust the network. It is closed, secured and safe. So there was no additional line of defense.

When we designed RavenDB’s security, we started from the assumption that any RavenDB node is publicly accessible and will be attacked. As such, we don’t allow you to run RavenDB on anything but the loopback device without setting up security. Even when you are running inside locked network, you’ll still have mutual authentication between client and server, you’ll still have all communications between client and server encrypted.

Defense in depth is the only thing that make sense. Yes, it is belt and suspenders, but it means that if you have a failure, your privates aren’t hanging in the wind, waiting to be sold on the Dark Web.

When designing a system that listen to the network, you have to start from assuming you’ll be attacked. And any additional steps to reduce the attack surface are just that. They’ll reduce it, not eliminate it. Because a firewall may fail or be misconfigured, and it may not happen to you. But if a completely separate machine on your closed network has been compromised, you best hope that it won’t be able to be a bridgehead for the rest of your system.

This attack expose 250,000,000 support records(!) and it was observed because it was obvious. This is the equivalent of a big pile of money landing at your feet. It gets noticed. But let’s assume that the elastic node was an empty one, so it wouldn’t be interesting. It takes very little from having access to an unsecured server to being able to execute code on it. And then you have a bridgehead. You can then access other servers, which may be accessible from the opened server, but not for the whole wide world. If they aren’t secured, well, it doesn’t matter what your firewall rules say anymore…

The network is always hostile. You can’t assume who is on the other side, or that you aren’t being eavesdropped on. Luckily, we have fairly routine counter measures. Use TLS for everything and make sure that you authenticate. How you do it doesn’t matter that much, to be honest. User / pass over HTTPS or X509 certificate are just different options. And while I can debate which ones are the best, anything is going to better than nothing at all. This applies for in house software as well. You microservices should authenticate, even if they are running in the isolated backend.

Yes, it can be a PITA to configure and deploy, but it isn’t really something that you can give up on. Because the network is always hostile.

time to read 10 min | 1962 words

imageI run into a really interesting discussion on Twitter, I suggest you go over the whole thread, it is fascinating reading.

I have written DI / IoC business applications for a decade and I was heavily involved at a popular IoC container for about five years, including implementing some core features (open generic binding, which was a PITA to do). Given the scope of the topic, I didn’t want to try to squeeze my thoughts on the subject into a Twitter soundbite, hence, this post.

A couple of weeks ago I posted about how I would start a new project today. With just enough architecture to get things started, and not much more. Almost implicit in my design is the fact that the system is composable. You add functionality to the system not by modifying existing code but by adding code. That isn’t new by any means. A quick search of my blog shows a series of posts from 2012 and a system architecture from 2008. No new ground trodden here, then. So why bother writing this post?

RavenDB doesn’t use a container. This is a pretty big and non trivial project that has no container involved. In fact, I don’t usually pull in containers any longer. For a long while, I tried to push as much complexity as possible into the container. It helped that I was part of the team building the container, so I could actually go ahead and add features to the container. That allowed me to create a system that was driven by convention. As long as you followed the convention, things magically worked and everyone was productive. If you didn’t follow the convention, well, I would need to debug that. Other people on the team could figure things out, but it generally fell on me (not that I minded).

The backend for RavenDB Cloud is the first time in a while that I took part in what you can consider as a business application rather than an infrastructure component. And that backend uses a container, IoC, interfaces, multiple dispatch, etc. It makes for a codebase that can adapt quickly, but also adds complexity. In the case of the cloud backend, just to name a few core features, we have: storage, machine allocations, recovery from failure, billing and monitoring. Each one of those may have multiple implementations (each cloud does storage and deployment differently, different accounts have different plans, etc).  Much of this is handled via implementing the relevant interfaces and dispatching to the right location based on the context of the operation.

In many ways, it works like magic. And it allows us to iterate quick and deploy to three separate cloud providers in a short amount of time. It is also magic. Much of our team is actually infrastructure developers. That has a totally different mindset than business app development. When I saw how these developers, with the infrastructure background, worked with the cloud backend, it was very instructive. To them, it was magic, and impenetrable at first. Interestingly enough, they didn’t need to understand all that was going on to get things done. We made sure that they did, after a while, but the IoC allowed us to ignore such concerns until later (gimme a cluster, don’t worry about how it is wired to the rest of the system).

The auto-wiring is one part of what you’ll typically get from a container. There are other, equally important parts, that don’t generally get as much attention: Using IoC usually means a decomposed systems, which is easier to test independently. And in addition to satisfying dependencies, the container is also in charge of managing the lifetime (or scope) of instances.

Let’s talk about the decomposed system and isolated testing first, because this tend to be a high priority for many people. I’m against such systems. Not because it makes testing easier (although keep that in mind, I’ll have something to say about it shortly) but because it is generally a very short slippery slope toward interface explosion. You end up with a lot of interfaces that have a single implementation. You now have composition issue, it is hard to figure out what is the flow of the code because everything is dynamically composed. That lead to a bunch of problems when you read the code (you have to jump around to understand what is going on) as well as performance issues (you can’t inline methods, you have to do interface calls, etc). Out of those, the first issue is far more important, mind.

Surprisingly, given that we have decomposed to small pieces to be able to work with each item independently, we are now in a much worst position if we want to change something. Because the code is scattered in many different locations and is composed on the fly, if I want to make a significant change, I have to make it in many places. To give a concrete example, let’s say that I need to pass a correlation token through my system, to do distributed tracing. I have to modify pretty much all the interfaces involved to pass this token through. And that lead us to the issue I promised with the tests.

A system that is composed of independent interfaces / implementations is easy to test in isolation. Because each implementation is independent and isolated from other areas of the system. The issue with such a system is that each individual component isn’t really doing much on its own. The benefit of the system is from multiple such components are assembled and working together. So the critical functionality that you have is the composed bundle, as well as the container configuration. But to test that, you need a system test. So you might as well structure you system so that system tests are easy, fast and obvious. Here is another way to do just that.

Finally, we get to the issue of lifetime management. It is easy to ignore just how important this feature is. Usually, you have three lifetimes in your application:

  • Singleton – for the entire application.
  • Transient – get a new instance each time.
  • Scoped – get the same instance in the same scope (typically a single requests).

Being able to rely on the container to manage lifetime is huge, because it is easy to mess things up. A good container will also match dependencies by their lifetimes. So if you have a singleton component it cannot accept a transient component since the lifetimes don’t match (but the other way around is obviously fine). There is an issue here as well. If you are injecting the dependencies, it is easy to lose track of the lifetime of your dependencies. It is easy to get into a situation where you (inadvertently, even) use a dependency to manage state between invocations and not realize that you have now relying on the lifetime of a dependency (or a dependency of dependency).

You might have noticed a theme in this post. I’m outlining a lot of problems, but no solutions. I’ll get to that in a bit, but I wanted to explain something important. Writing non trivial software is complex. This is the nature of the beast. We can re-arrange the complexity or we can sweep it under the rug. There are good use cases for either option, but I would rather that people make this choice explicitly. What you can’t do is eliminate the complexity entirely, at best, you have tamed it.

Earlier, I said that RavenDB doesn’t use a container, which is true (somewhat). But it is using inversion of control. A lot of the core classes are using constructor injection, for example. Let’s take what is probably the most important class we have, DocumentDatabase. That is the class that represent a database inside a RavenDB process. It accept its dependencies (the configuration, the server it is running on, etc) and then is constructed. We don’t use a container here because the setup process of a database in RavenDB is complex. We first create the DocumentDatabase instance, then we have to initialize it. Initializing a database may mean running recovering, loading a lot of data from disk, etc. So we do that in an async manner. When a request comes in for a particular database, we get it, or wait until it is loaded. We will also dispose the database if it has been idle for enough time. So in this case, we have complex (async) initialization, in which we have to deal with a lot of failure modes. We also have a lifetime scope that is based on idle time, which doesn’t fit the usual modes for a container.

Because we manually control how we create the database instance, it is explicit what its dependencies, lifetime and behavior are. We have quite a few example of such classes. For example, the database instance holds DocumentStorage, AttachmentStorage, etc. It is important to note that the number if finite and relatively small. It allow us to reason about the interaction in the database in a static and predictable manner.

Remember when I said that we don’t use a container? That is almost true. There is one location where I wrote our own mini container. One thing that RavenDB has a lot of is Endpoints. An endpoint is the method that handles a particular HTTP request. At last count we had over 300 of them. I don’t have the time / willingness to wire all of these manually. That would put undue burden on developing a new endpoint. And that is the key observation. For stuff that doesn’t change very often (the structure of the database), we do things manually. For the things that we add a lot of (endpoints), we make it as smooth as possible. Adding a new endpoint is adding a class that inherit from a known base class, and that is pretty much it.

Our routing infrastructure will gather all of the implementation, wire up the routing and when a request come in will create an instance of the class in question, inject it the relevant context (what database it is running on, the current request, etc) and then execute it. Just like a container would, in fact, because for all intents and purposes, it is one. What we have done is optimize one aspect, which we deal with often, while manually dealing with the stuff that is rarely changing. That means that if I do need to make a change there, the level of magic involved is greatly reduced. And in RavenDB in particular, we can and have measured the difference in performance between running things through any abstraction layer and doing things directly. To the point where in certain parts of our codebase, an interface method invocation is forbidden because the cost would be too high.

There is another aspect of this architecture, it means that the easiest thing in our code would be to add a new endpoint. That being the easiest thing, it is usually what will happen. This means that we’re far more likely to follow the open/closed principal. It also lead to most of our code looking fairly similar in shape. That make maintenance, code reviews and the act of writing new code a lot simpler. I don’t have to make decisions about structure, I just have to let the code flow.

time to read 4 min | 797 words

In my last post, I talked about how to store and query time series data in RavenDB. You can query over the time series data directly, as shown here:

You’ll note that we project a query over a time range for a particular document. We could also query over all documents that match a particular query, of course. One thing to note, however, is that time series queries are done on a per time series basis and each time series belong to a particular document.

In other words, if I want to ask a question about time series data across documents, I can’t just query for it, I need to do some prep work first. This is done to ensure that when you query, we’ll be able to give you the right results, fast.

As a reminder, we have a bunch of nodes that we record metrics of. The metrics so far are:

  • Storage – [ Number of objects, Total size used, Total storage size].
  • Network – [Total bytes in, Total bytes out]

We record these metrics for each node at regular intervals. The query above can give us space utilization over time in a particular node, but there are other questions that we would like to ask. For example, given an upload request, we want to find the node with the most free space. Note that we record the total size used and the total storage available only as time series metrics. So how are we going to be able to query on it? The answer is that we’ll use indexes. In particular, a map/reduce index, like the following:

This deserve some explanation, I think. Usually in RavenDB, the source of an index is a docs.[Collection], such as docs.Users. In this case, we are using a timeseries index, so the source is timeseries.[Collection].[TimeSeries]. In this case, we operate over the Storage timeseries on the Nodes collection.

When we create an index over a timeseries, we are exposed to some internal structural details. Each timestamp in a timeseries isn’t stored independently. That would be incredibly wasteful to do. Instead, we store timeseries together in segments. The details about how and why we do that don’t really matter, but what does matter is that when you create an index over timeseries, you’ll be indexing the segment as a whole. You can see how the map access the Entries collection on the segment, getting the last one (the most recent) and output it.

The other thing that is worth noticing in the map portion of the index is that we operate on the values of the time stamp. In this case, Values[2] is the total amount of storage available and Values[1] is the size used. The reduce portion of the index, on the other hand, is identical to any other map/reduce index in RavenDB.

What this index does, essentially, is tell us what is the most up to date free space that we have for each particular node. As for querying it, let’s see how that works, shall we?

image

Here we are asking for the node with the least disk space that can contain the data we want to write. This can be reduce fragmentation in the system as a whole, by ensuring that we use the best fit method.

Let’s look at a more complex example of indexing time series data, computing the total network usage for each node on a monthly basis. This is not trivial because we record network utilization on a regular basis, but need to aggregate that over whole months.

Here is the index definition:

As you can see, the very first thing we do is to aggregate the entries based on their year and month. This is done because a single segment may contain data from multiple months. We then sum up the values for each month and compute the total in the reduce.

image

The nice thing about this feature is that we are able to aggregate large amount of data and benefit from the usual advantages of RavenDB map/reduce indexes. We have already massaged the data to the right shape, so queries on it are fast.

Time series indexes in RavenDB allows us to merge time series data from multiple documents, I could have aggregated the computation above across multiple nodes to get the total per customer, so I’ll know how much to charge them at the end of the month, for example.

I would be happy to know hear about any other scenarios that you can think of for using timeseries in RavenDB, and in particular, what kind of queries you’ll want to do on the data.

time to read 4 min | 633 words

RavenDB 5.0 is coming soon and the big new there is time series support. We have gotten to the point where we can actually show off what we can do, which makes me very happy. You can use the nightlies builds to explore time series support in RavenDB 5.0. Client side packages for 5.0 are also available.

image

I went ahead and created a new database and created some documents:

image

Time series are often used for monitoring, so I decided to go with the flow and see what kind of information we would want to store there. Here is how we can add some time series data to the documents:

I want to focus on this for a bit, because it is important. A time series in RavenDB has the following details:

  • The timestamp to associate to the values – in the code above, this is the current time (UTC)
  • The tag associated with the timestamp – in the code above, we record what devices and interfaces these measurements belong to.
  • The measurements themselves – RavenDB allows you to record multiple values for a single timestamp. We threat them as an array of values, and you can chose to put them in a single time series or to split them.

Let’s assume that we have quite a few measurements like this and that we want to look at the data. You can explore things in the Studio, like so:

image

We have another tab in the Studio that you can look at which will give you some high level details about the timeseries for a particular document. We can dig deeper, too, and see the actual values:

image

You can also query the data to see the patterns and not just the individual values:

The output will look like this:

image

And you can click on the eye to get more details in chart form. You can see a little bit of this here, but it is hard to do it justice with a small screen shot:

image

Here is what the data you get back from this query:

The ability to store and process time series data is very important for monitoring, IoT and healthcare systems. RavenDB is able to do quite well in these areas. For example, to aggregate over 11.7 million heartrate details over 6 years at a weekly resolution takes less than 50 ms.

We have tested timeseries that contained over 150 million entries and we can aggregate results back over the entire data set in under three seconds. That is a nice number, but it doesn’t match what dedicated time series databases can do. It represents a rate of about 65 million rows / second. ScyllaDB recently published a benchmark in which they talk about billion rows / sec. But they did that on 83 nodes, so they did just 12 million / sec per node. Less than a fifth of RavenDB’s speed.

But that is being unfair, to be honest. While timeseries queries are really interesting, we don’t really expect users to query very large amount of data using raw queries. That is what we have indexes for, after all. I’m going to talk about this in depth in my next post.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production postmortem (28):
    21 Feb 2020 - The self signed certificate that couldn’t
  2. RavenDB 5.0 (2):
    21 Jan 2020 - Exploring Time Series–Part II
  3. Webinar (2):
    15 Jan 2020 - RavenDB’s unique features
  4. Challenges (2):
    03 Jan 2020 - Spot the bug in the stream–answer
  5. Challenge (55):
    02 Jan 2020 - Spot the bug in the stream
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats