Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

You can reach me by:

oren@ravendb.net

+972 52-548-6969

Posts: 6,950 | Comments: 49,488

filter by tags archive
time to read 10 min | 1962 words

imageI run into a really interesting discussion on Twitter, I suggest you go over the whole thread, it is fascinating reading.

I have written DI / IoC business applications for a decade and I was heavily involved at a popular IoC container for about five years, including implementing some core features (open generic binding, which was a PITA to do). Given the scope of the topic, I didn’t want to try to squeeze my thoughts on the subject into a Twitter soundbite, hence, this post.

A couple of weeks ago I posted about how I would start a new project today. With just enough architecture to get things started, and not much more. Almost implicit in my design is the fact that the system is composable. You add functionality to the system not by modifying existing code but by adding code. That isn’t new by any means. A quick search of my blog shows a series of posts from 2012 and a system architecture from 2008. No new ground trodden here, then. So why bother writing this post?

RavenDB doesn’t use a container. This is a pretty big and non trivial project that has no container involved. In fact, I don’t usually pull in containers any longer. For a long while, I tried to push as much complexity as possible into the container. It helped that I was part of the team building the container, so I could actually go ahead and add features to the container. That allowed me to create a system that was driven by convention. As long as you followed the convention, things magically worked and everyone was productive. If you didn’t follow the convention, well, I would need to debug that. Other people on the team could figure things out, but it generally fell on me (not that I minded).

The backend for RavenDB Cloud is the first time in a while that I took part in what you can consider as a business application rather than an infrastructure component. And that backend uses a container, IoC, interfaces, multiple dispatch, etc. It makes for a codebase that can adapt quickly, but also adds complexity. In the case of the cloud backend, just to name a few core features, we have: storage, machine allocations, recovery from failure, billing and monitoring. Each one of those may have multiple implementations (each cloud does storage and deployment differently, different accounts have different plans, etc).  Much of this is handled via implementing the relevant interfaces and dispatching to the right location based on the context of the operation.

In many ways, it works like magic. And it allows us to iterate quick and deploy to three separate cloud providers in a short amount of time. It is also magic. Much of our team is actually infrastructure developers. That has a totally different mindset than business app development. When I saw how these developers, with the infrastructure background, worked with the cloud backend, it was very instructive. To them, it was magic, and impenetrable at first. Interestingly enough, they didn’t need to understand all that was going on to get things done. We made sure that they did, after a while, but the IoC allowed us to ignore such concerns until later (gimme a cluster, don’t worry about how it is wired to the rest of the system).

The auto-wiring is one part of what you’ll typically get from a container. There are other, equally important parts, that don’t generally get as much attention: Using IoC usually means a decomposed systems, which is easier to test independently. And in addition to satisfying dependencies, the container is also in charge of managing the lifetime (or scope) of instances.

Let’s talk about the decomposed system and isolated testing first, because this tend to be a high priority for many people. I’m against such systems. Not because it makes testing easier (although keep that in mind, I’ll have something to say about it shortly) but because it is generally a very short slippery slope toward interface explosion. You end up with a lot of interfaces that have a single implementation. You now have composition issue, it is hard to figure out what is the flow of the code because everything is dynamically composed. That lead to a bunch of problems when you read the code (you have to jump around to understand what is going on) as well as performance issues (you can’t inline methods, you have to do interface calls, etc). Out of those, the first issue is far more important, mind.

Surprisingly, given that we have decomposed to small pieces to be able to work with each item independently, we are now in a much worst position if we want to change something. Because the code is scattered in many different locations and is composed on the fly, if I want to make a significant change, I have to make it in many places. To give a concrete example, let’s say that I need to pass a correlation token through my system, to do distributed tracing. I have to modify pretty much all the interfaces involved to pass this token through. And that lead us to the issue I promised with the tests.

A system that is composed of independent interfaces / implementations is easy to test in isolation. Because each implementation is independent and isolated from other areas of the system. The issue with such a system is that each individual component isn’t really doing much on its own. The benefit of the system is from multiple such components are assembled and working together. So the critical functionality that you have is the composed bundle, as well as the container configuration. But to test that, you need a system test. So you might as well structure you system so that system tests are easy, fast and obvious. Here is another way to do just that.

Finally, we get to the issue of lifetime management. It is easy to ignore just how important this feature is. Usually, you have three lifetimes in your application:

  • Singleton – for the entire application.
  • Transient – get a new instance each time.
  • Scoped – get the same instance in the same scope (typically a single requests).

Being able to rely on the container to manage lifetime is huge, because it is easy to mess things up. A good container will also match dependencies by their lifetimes. So if you have a singleton component it cannot accept a transient component since the lifetimes don’t match (but the other way around is obviously fine). There is an issue here as well. If you are injecting the dependencies, it is easy to lose track of the lifetime of your dependencies. It is easy to get into a situation where you (inadvertently, even) use a dependency to manage state between invocations and not realize that you have now relying on the lifetime of a dependency (or a dependency of dependency).

You might have noticed a theme in this post. I’m outlining a lot of problems, but no solutions. I’ll get to that in a bit, but I wanted to explain something important. Writing non trivial software is complex. This is the nature of the beast. We can re-arrange the complexity or we can sweep it under the rug. There are good use cases for either option, but I would rather that people make this choice explicitly. What you can’t do is eliminate the complexity entirely, at best, you have tamed it.

Earlier, I said that RavenDB doesn’t use a container, which is true (somewhat). But it is using inversion of control. A lot of the core classes are using constructor injection, for example. Let’s take what is probably the most important class we have, DocumentDatabase. That is the class that represent a database inside a RavenDB process. It accept its dependencies (the configuration, the server it is running on, etc) and then is constructed. We don’t use a container here because the setup process of a database in RavenDB is complex. We first create the DocumentDatabase instance, then we have to initialize it. Initializing a database may mean running recovering, loading a lot of data from disk, etc. So we do that in an async manner. When a request comes in for a particular database, we get it, or wait until it is loaded. We will also dispose the database if it has been idle for enough time. So in this case, we have complex (async) initialization, in which we have to deal with a lot of failure modes. We also have a lifetime scope that is based on idle time, which doesn’t fit the usual modes for a container.

Because we manually control how we create the database instance, it is explicit what its dependencies, lifetime and behavior are. We have quite a few example of such classes. For example, the database instance holds DocumentStorage, AttachmentStorage, etc. It is important to note that the number if finite and relatively small. It allow us to reason about the interaction in the database in a static and predictable manner.

Remember when I said that we don’t use a container? That is almost true. There is one location where I wrote our own mini container. One thing that RavenDB has a lot of is Endpoints. An endpoint is the method that handles a particular HTTP request. At last count we had over 300 of them. I don’t have the time / willingness to wire all of these manually. That would put undue burden on developing a new endpoint. And that is the key observation. For stuff that doesn’t change very often (the structure of the database), we do things manually. For the things that we add a lot of (endpoints), we make it as smooth as possible. Adding a new endpoint is adding a class that inherit from a known base class, and that is pretty much it.

Our routing infrastructure will gather all of the implementation, wire up the routing and when a request come in will create an instance of the class in question, inject it the relevant context (what database it is running on, the current request, etc) and then execute it. Just like a container would, in fact, because for all intents and purposes, it is one. What we have done is optimize one aspect, which we deal with often, while manually dealing with the stuff that is rarely changing. That means that if I do need to make a change there, the level of magic involved is greatly reduced. And in RavenDB in particular, we can and have measured the difference in performance between running things through any abstraction layer and doing things directly. To the point where in certain parts of our codebase, an interface method invocation is forbidden because the cost would be too high.

There is another aspect of this architecture, it means that the easiest thing in our code would be to add a new endpoint. That being the easiest thing, it is usually what will happen. This means that we’re far more likely to follow the open/closed principal. It also lead to most of our code looking fairly similar in shape. That make maintenance, code reviews and the act of writing new code a lot simpler. I don’t have to make decisions about structure, I just have to let the code flow.

time to read 4 min | 797 words

In my last post, I talked about how to store and query time series data in RavenDB. You can query over the time series data directly, as shown here:

You’ll note that we project a query over a time range for a particular document. We could also query over all documents that match a particular query, of course. One thing to note, however, is that time series queries are done on a per time series basis and each time series belong to a particular document.

In other words, if I want to ask a question about time series data across documents, I can’t just query for it, I need to do some prep work first. This is done to ensure that when you query, we’ll be able to give you the right results, fast.

As a reminder, we have a bunch of nodes that we record metrics of. The metrics so far are:

  • Storage – [ Number of objects, Total size used, Total storage size].
  • Network – [Total bytes in, Total bytes out]

We record these metrics for each node at regular intervals. The query above can give us space utilization over time in a particular node, but there are other questions that we would like to ask. For example, given an upload request, we want to find the node with the most free space. Note that we record the total size used and the total storage available only as time series metrics. So how are we going to be able to query on it? The answer is that we’ll use indexes. In particular, a map/reduce index, like the following:

This deserve some explanation, I think. Usually in RavenDB, the source of an index is a docs.[Collection], such as docs.Users. In this case, we are using a timeseries index, so the source is timeseries.[Collection].[TimeSeries]. In this case, we operate over the Storage timeseries on the Nodes collection.

When we create an index over a timeseries, we are exposed to some internal structural details. Each timestamp in a timeseries isn’t stored independently. That would be incredibly wasteful to do. Instead, we store timeseries together in segments. The details about how and why we do that don’t really matter, but what does matter is that when you create an index over timeseries, you’ll be indexing the segment as a whole. You can see how the map access the Entries collection on the segment, getting the last one (the most recent) and output it.

The other thing that is worth noticing in the map portion of the index is that we operate on the values of the time stamp. In this case, Values[2] is the total amount of storage available and Values[1] is the size used. The reduce portion of the index, on the other hand, is identical to any other map/reduce index in RavenDB.

What this index does, essentially, is tell us what is the most up to date free space that we have for each particular node. As for querying it, let’s see how that works, shall we?

image

Here we are asking for the node with the least disk space that can contain the data we want to write. This can be reduce fragmentation in the system as a whole, by ensuring that we use the best fit method.

Let’s look at a more complex example of indexing time series data, computing the total network usage for each node on a monthly basis. This is not trivial because we record network utilization on a regular basis, but need to aggregate that over whole months.

Here is the index definition:

As you can see, the very first thing we do is to aggregate the entries based on their year and month. This is done because a single segment may contain data from multiple months. We then sum up the values for each month and compute the total in the reduce.

image

The nice thing about this feature is that we are able to aggregate large amount of data and benefit from the usual advantages of RavenDB map/reduce indexes. We have already massaged the data to the right shape, so queries on it are fast.

Time series indexes in RavenDB allows us to merge time series data from multiple documents, I could have aggregated the computation above across multiple nodes to get the total per customer, so I’ll know how much to charge them at the end of the month, for example.

I would be happy to know hear about any other scenarios that you can think of for using timeseries in RavenDB, and in particular, what kind of queries you’ll want to do on the data.

time to read 4 min | 633 words

RavenDB 5.0 is coming soon and the big new there is time series support. We have gotten to the point where we can actually show off what we can do, which makes me very happy. You can use the nightlies builds to explore time series support in RavenDB 5.0. Client side packages for 5.0 are also available.

image

I went ahead and created a new database and created some documents:

image

Time series are often used for monitoring, so I decided to go with the flow and see what kind of information we would want to store there. Here is how we can add some time series data to the documents:

I want to focus on this for a bit, because it is important. A time series in RavenDB has the following details:

  • The timestamp to associate to the values – in the code above, this is the current time (UTC)
  • The tag associated with the timestamp – in the code above, we record what devices and interfaces these measurements belong to.
  • The measurements themselves – RavenDB allows you to record multiple values for a single timestamp. We threat them as an array of values, and you can chose to put them in a single time series or to split them.

Let’s assume that we have quite a few measurements like this and that we want to look at the data. You can explore things in the Studio, like so:

image

We have another tab in the Studio that you can look at which will give you some high level details about the timeseries for a particular document. We can dig deeper, too, and see the actual values:

image

You can also query the data to see the patterns and not just the individual values:

The output will look like this:

image

And you can click on the eye to get more details in chart form. You can see a little bit of this here, but it is hard to do it justice with a small screen shot:

image

Here is what the data you get back from this query:

The ability to store and process time series data is very important for monitoring, IoT and healthcare systems. RavenDB is able to do quite well in these areas. For example, to aggregate over 11.7 million heartrate details over 6 years at a weekly resolution takes less than 50 ms.

We have tested timeseries that contained over 150 million entries and we can aggregate results back over the entire data set in under three seconds. That is a nice number, but it doesn’t match what dedicated time series databases can do. It represents a rate of about 65 million rows / second. ScyllaDB recently published a benchmark in which they talk about billion rows / sec. But they did that on 83 nodes, so they did just 12 million / sec per node. Less than a fifth of RavenDB’s speed.

But that is being unfair, to be honest. While timeseries queries are really interesting, we don’t really expect users to query very large amount of data using raw queries. That is what we have indexes for, after all. I’m going to talk about this in depth in my next post.

time to read 6 min | 1100 words

When it comes to security, the typical question isn’t whatever they are after you but how much. I love this paper on threat modeling, and I highly recommend it. But sometimes, you have information that you just don’t want to have. In other words, you want to store information inside of the database, but without the database or application being able to read said information without a key supplied by the user.

For example, let’s assume that we need to store the credit card information of a customer. We need to persist this information, but we don’t want to know it. We need something more from the user in order to actually use it.

The point of this post isn’t actually to talk about how to store credit card information in your database, instead it is meant to walk you through an approach in which you can keep data about a user that you can only access in the context of the user.

In terms of privacy, that is a very important factor. You don’t need to worry about a rogue DBA trawling through sensitive records or be concerned about a data leak because of an unpatched hole in your defenses. Furthermore, if you are carrying sensitive information that a third party may be interested in, you cannot be compelled to give them access to that information. You literally can’t, unless the user steps up and provide the keys.

Note that this is distinctly different (and weaker) than end to end encryption. With end to end encryption the server only ever sees encrypted blobs. With this approach, the server is able to access the encryption key with the assistance of the user. That means that if you don’t trust the server, you shouldn’t be using this method. Going back to the proper threat model, this is a good way to ensure privacy for your users if you need to worry about getting a warrant for their data. Basically, consider this as one of the problems this is meant to solve.

When the user logs in, they have to use a password. Given that we aren’t storing the password, that means that we don’t know it. This means that we can use that as the user’s personal key for encrypting and decrypting the user’s information. I’m going to use Sodium as the underlying cryptographic library because that is well known, respected and audited. I’m using the Sodium.Core NuGet package for my code samples. Our task is to be able to store sensitive data about the user (in this case, the credit card information, but can really be anything) without being able to access it unless the user is there.

A user is identified using a password, and we use Argon2id to create the password hash. This ensures that you can’t brute force the password. So far, this is fairly standard. However, instead of asking Argon2 to give us a 16 bytes key, we are going to ask it to give us a 48 bytes key. There isn’t really any additional security in getting more bytes. Indeed, we are going to consider only the first 16 bytes that were returned to us as important for verifying the password. We are going to use the remaining 32 bytes as a secret key. Let’s see how this looks like in code:

Here is what we are doing here. We are getting 48 bytes from Argon2id using the password. We keep the first 16 bytes to authenticate the user next time. Then we generate a random 256 bits key and encrypt that using the last part of the output of the Argon2id call. The function returns the generated config and the encryption key. You can now encrypt data using this key as much as you want. But while we assume that the CryptoConfig is written to a persistent storage, we are not keeping the encryption key anywhere but memory. In fact, this code is pretty cavalier about its usage. You’ll typically store encryption keys in locked memory only, wipe them after use, etc. I’m skipping these steps here in order to get to the gist of things.

Once we forget about the encryption key, all the data we have about the user is effectively random noise. If we want to do something with it, we have to get the user to give us the password again. Here is what the other side looks like:

We authenticate using the first 16 bytes, then use the other 32 to decrypt the actual encryption key and return that. Without the user’s password, we are blocked from using their data, great!

You’ll also notice that the actual key we use is random. We encrypt it using the key derived from the user’s password but we are using a random key. Why is that? This is to enable us to change passwords. If the user want to change the password, they’ll need to provide the old password as well as the new. That allows us to decrypt the actual encryption key using the key from the old password and encrypt it again with the new one.

Conversely, resetting a user’s password will mean that you can no longer access the encrypted data. That is actually a feature. Leaving aside the issue of warrants for data seizure, consider the case that we use this system to encrypt credit card information. If the user reset their password, they will need to re-enter their credit card. That is great, because that means that even if you managed to reset the password (for example, by gaining access to their email), you don’t get access tot he sensitive information.

With this kind of system in place, there is one thing that you have to be aware of. Your code needs to (gracefully) handle the scenario of the data not being decryptable. So trying to get the credit card information and getting an error should be handled and not crash the payment processing system Smile. It is a different mindset, because it may violate invariants in the system. Only users with a credit card may have a pro plan, but after a password reset, they “have” a credit card, in the sense that there is data there, but it isn’t useful data. And you can’t check, unless you had the user provide you with the password to get the encryption key.

It means that you need to pay more attention to the data model you have. I would suggest not trying to hide the fact that the data is encrypted behind a lazily decryption façade but deal with it explicitly.

time to read 5 min | 829 words

In RavenDB Cloud, we routinely monitor the usage of the RavenDB Cluster that our customers run. We noticed something strange in one of them, the system utilization didn’t match the expected load given the number of requests the cluster was handling. We talked to the customer to try to figure out what was going and we had the following findings (all details are masked, naturally, but the gist is the same).

  • The customer stores millions of documents in RavenDB.
  • The document sizes range from 20KB – 10 MB.

Let’s say that the documents in questions were BlogPosts which had a Comments array in them. We analyzed the document data and found the following details:

  • 99.89% of the documents had less than 100 comments in them.
  • 0.11% of the documents (a few thousands) had more than 100 comments in them.
  • 99.98% of the documents had less than 200 comments in them.
  • 0.02% of the documents had more than 200 comments in them!

Let’s look at this 0.02%, shall we?

image

The biggest document exceeded 10 MB and had over 24,000 comments in it.

So far, that didn’t seem like a suboptimal modeling decision, which can lead to some inefficiencies for those 0.02% of the cases. Not ideal, but no big deal. However, the customer also defined the following index:

from post in docs.Posts from comment in post.Comments select new { ... }

Take a minute to look at this index. Note the parts that I marked? This index is doing something that look innocent. It index all the comments in the post, but it does this using a fanout. The index will contain as many index entries as the document has comments. We also need to store all of those comments in the index as well as in the document itself.

Let’s consider what is the cost of this index as far as RavenDB is concerned. Here is the cost per indexing run for different sized documents, from 0 to 25 comments.

image

This looks like a nice linear cost, right? O(N) cost as expected. But we only consider the cost for a single operation. Let’s say that we have a blog post that we add 25 comments to, one at a time. What would be the total amount of work we’ll need to do? Here is what this looks like:

image

Looks familiar? This is O(N^2) is all its glory.

Let’s look at the actual work done for different size documents, shall we?

image

I had to use log scale here, because the numbers are so high.

The cost of indexing a document with 200 comments is 20,100 operations. For a thousand comments, the cost is 500,500 operations.  It’s over 50 millions for a document with 10,000 comments.

Given the fact that the popular documents are more likely to change, that means that we have quite a lot of load on the system, disproportional to the number of actual requests we have, because we have to do so much work during indexing.

So now we know what was the cause of the higher than expected utilization. The question here, what can we do about this? There are two separate issues that we need to deal with here. The first, the actual data modeling, is something that I have talked about before. Instead of putting all the comments in a single location, break it up based on size / date / etc. The book also has some advice on the matter. I consider this to be the less urgent issue.

The second problem is the cost of indexing, which is quite high because of the fanout. Here we need to understand why we have a fanout. The user may want to be able to run a query like so:

This will give us the comments of a particular user, projecting them directly from their store in the index. We can change the way we structure the index and then project the relevant results directly. Let’s see the code:

As you can see, instead of having a fanout, we’ll have a single index entry per document. We’ll still need to do more work for larger documents, but we reduced it significantly. More importantly, we don’t need to store the data. Instead, at query time, we use the where clause to find documents that match, then project just the comments that we want back to the user.

Simple, effective and won’t cause your database’s workload to increase significantly.

time to read 4 min | 798 words

RavenDB has two separate APIs that allow you to get push notifications from the database. The first one is the Subscriptions API, which allows you to define a query such as:

And then subscribe to it like so:

RavenDB will now push batches of orders that match your query to the client. This is done in a reliable manner. If the client fails for any reason, it can reconnect and resume from where it left off. If the server failed, the cluster will automatically reassign the work to another node and the client will pick up from where it left off. The subscription is also persistent, that means that whenever you connect to it, you don’t start from the beginning. After the subscription has caught up with all the documents that match the query, it isn’t over. Instead, the client will wait for new or updated documents to come in so the server can push them immediately. The typical latency between a document change and the subscription processing it is about twice the ping time between the client and server (so in the order of milliseconds). Only a single client at a time can have a particular subscription open, but multiple clients can contend on the subscription. One of them will win and the others will wait for the subscription to become available (when the first client stop / fail / crash, etc).

This make subscriptions highly suitable for business processing. It is reliable, you already have high availability on the server side and you can easily add that on the client side. You can use complex queries and do quite a bit of work on the database side, before it ever reaches your code. Subscriptions also allow you to run queries over revisions, so instead of getting the current state of the document, you’ll be called with the (prev, current) tuple on any document change. That gives you even more power to work with.

On the other hand, subscriptions requires RavenDB to manage quite a bit of (distributed) state and as such consume resources at the cluster level.

The Changes API, on the other hand, has a very different model, let’s look at the code first, and then discuss this in details:

As you can see, we can subscribe to changes on a document or a collection. We actually have quite a bit of events that we can respond to. A document change (by id, prefix or collection), an index (created / removed, indexing batch completed, etc), an operation (created / status changed / completed), a counter (created / modified), etc.

Some things that can be seen even from just this little bit of code. The Changes API is not persistent. That means, if you’ll restart the client and reconnect, you’ll not get anything that already happened. This is intended for ongoing usage, not for critical processing. You also cannot do any complex queries with changes. You have the filters that are available and that is it. Another important distinction is that with the Subscription API, you are getting the document (and can also include additional ones), but with the Changes API, you’re getting the document id only.

The most common scenario for the Changes API is to implement this:

image

Whenever a user is editing a particular document, you’ll subscribe to the document and if it changed behind the scenes, you can notify the user about this so they won’t continue to edit the document and get an optimistic concurrency error on save.

The Changes API is also used internally by RavenDB to implement a lot of features in the Studio and for tracking long running operations from the client. It is lightweight and requires very little resources from the server (and none from the cluster). On the other hand, it is meant to be a best effort feature. If the Changes connection has failed, the client will transparently reconnect to the server and re-subscribe to all the pending subscriptions. However, any changes that happened while the client was not connected are lost.

The two APIs are very similar on the surface, both of them allow you to get push notifications from RavenDB but their usage scenarios and features are very different. The Changes API is literally that, it is meant to allow you to watch for changes. Probably because you have a human sitting there and looking at things. It is meant to be an additional feature, not a guarantee. The Subscriptions API, on the other hand, is a reliable system and can ensure that you’ll not miss out of notifications that matter to you.

You can read more about Subscriptions in RavenDB in the book, I decided a whole chapter to it.

time to read 3 min | 469 words

I was talking with a developer about their system architecture and they mentioned that they are going through some complexity at the moment. They are changing their architecture to support higher scaling needs. Their current architecture is fairly simple (single app talking to a database), but in order to handle future growth, they are moving to a distributed micro service architecture. After talking with the dev for a while, I realized that they were in a particular industry that had a hard barrier for scale.

I’m not sure how much I can say, so let’s say that they are providing a platform to setup parties for newborns in a particular country. I went ahead and checked how many babies you had in that country, and the number has been pretty stable for the past decade, sitting on around 60,000 babies per year.

Remember, this company provide a specific service for newborns. And that service is only applicable for that country. And there are about 60,000 babies per year in that country. In this case, this is the time to do some math:

  • We’ll assume that all those births happen on a single month
  • We’ll assume that 100% of the babies will use this service
  • We’ll assume that we need to handle them within business hours only
  • 4 weeks x 5 business days x 8 business hours = 160 hours to handle 60,000 babies
  • 375 babies to handle per hour
  • Let’s assume that each baby requires 50 requests to handle
  • 18,750 requests / hour
  • 312 requests / minute
  • 5 requests / second

In other words, given the natural limit of their scaling (number of babies per year), and using very pessimistic accounting for the load distribution, we get to a number of requests to process that is utterly ridiculous.

It would be hard to not handle this properly on any server you care to name. In fact, you can get a machine under 150$ / month that has 8 cores. That gives you a core per requests per second, with 3 to spare.

Even if we have to deal with spikes of 50 requests / second. Any reasonable server ( the < 150% / month I mentioned) should be able to easily handle this.

About the only way for this system to get additional load is if there is a population explosion, at which point I assume that the developers will be busy handling nappies, not watching the CPU utilization.

For certain type of applications, there is a hard cap of what load you can be expected to handle. And you should absolutely take advantage of this. The more stuff you can not do, the better you are. And if you can make reasonable assumptions about your load, you don’t need to go crazy.

Simpler architecture means faster time to market, meaning that you can actually deliver value, rather than trying to prepare for the Babies’ Apocalypse.

time to read 4 min | 671 words

image

I got an email recently asking about my advice on how to approach the architecture on new projects. In particular, looking at typical architectural patterns, they are full of things like repositories, interfaces, components and multiple moving pieces. Usually they are marketed as items that will aid future extensibility or promote separation of concerns.

In fact, if you’ll look at the image on the right, you’ll see a typical timeline for a non trivial project. The amount of time that will be spent on the project architecture before we actually start writing any real code is obscene, in my eyes. This is the stage where we know the least about the project, and we are already spending so much time on it to get it moving. By the time we actually start getting real work done, the inertia and the amount of investment that we put into the infrastructure means that you usually can’t change it.

My approach for this is different. My goal at the beginning of the project is to get, as soon as possible, to the point where we have something that we can show to the user. That means that we don’t have time to do complex setups. Instead, I tend to follow the following architecture rule:

image

When I start a project, I write the minimal amount of infrastructure that I think that I can get away with. It is important to note that I’m writing this infrastructure with the explicit intent to throw it away. This is scaffolding, not permanent structure. Do you see the orange wheel in the picture?

That is meant to represent the abstraction layer between the infrastructure and the rest of the code. Once I have something that I can get away with, I can go ahead and solve real problems. If and when I’ll realize that I can’t really go on the way I did, I can change the infrastructure and not touch any of the application code.

For example, let’s say that I’m building a backend service. There are a lot of decisions that I need to make when building these:

  • What is the transport mechanism? REST? gRPC? SOAP?
  • How do you handle authentication?
  • How do you handle auditing?
  • How do you… ?

I can spend weeks and months on making these decisions. And they are important, but they aren’t useful at the beginning of the project. What is worse, they are likely to change. Six months down the road, we might realize that we need to meet the requirements of a particular regulation, so we can’t use a particular transport. If we are wedded to it, we are done.

So my infrastructure would be the minimal amount of code required to get something done. Here is an example of how it can work:

I’m now able to write code that inherit from AbstractMessageHandler and does stuff, using strongly typed code, without really caring how I go the messages or how I send them back. This is probably now what we’ll end up using, but it is small, easily handled / replaced and I can change it without touching the application code.

I have also ensured that I’m able to lean on the compiler and use the proper types, avoiding dealing with strings and implicitly typed values.

It took me about 15 minutes to create the above code, and my limit for setup time for a new project is one to three days. Afterward, we have to be able to write actual code, not infrastructure.

I’ll admit that it does take some thinking upfront. In the code above, I’ve decided to use a message handler pattern. In some cases, that wouldn’t be appropriate and I would need something else (for example, having an explicit channel to communicate between actors). Usually, however, these theme of the project is easier to decide on than the actual infrastructure.

time to read 3 min | 532 words

Once you put a document inside RavenDB, this is pretty much it, as far as RavenDB is concerned. It will keep your data safe, allow to query it, etc. But it doesn’t generally act upon it. There are a few exceptions, however.

RavenDB supports the @expires metadata attribute. This attribute allows you to specify a specific time in which RavenDB will automatically delete the document. This is very useful for expiring documents. The classic example being a password reset token, which should be valid for a period of time and then removed.

Here is what this looks like:

image

And you can configure the frequency in which we’ll check for expired documents in the studio.

image

Expiring documents, however, isn’t all that RavenDB can do. RavenDB also has an additional feature, refreshing documents. You can mark a document to be refreshed by specifying the @refresh metadata attribute, like so:

image

It is easy to understand what @expires do. At a given time, it will delete the document, because it expired. But what does refresh do? Well, at the specified time, a document with the @refresh metadata attribute will be updated by RavenDB to remove the @refresh metadata attribute from the document.

Yep, that is all. In other words, the document above would turn into:

image

That is all. Surely this is the most useless feature ever. You set a property that will be removed at a future time, but the only thing that the property can say is when to remove itself. What kind of feature is this?

Well, this is a case where by itself, this would be a pretty useless feature. But the point of this feature is that this will cause the document to be updated. At that point, it is a normal update, which means that:

  • The document will be re-indexed.
  • The document will be sent over ETL.
  • The document will be sent to the relevant subscriptions.

The last point is the most important one. Here is an example of a typical subscription:

As you can see, this is a pretty trivial subscription, but it filters out commands that are set to refresh. What does this mean? It means that if the @refresh attribute is set, we’ll ignore the document. But since RavenDB will automatically clear the attribute when the refresh timer is hit, we gain a powerful ability.

We now have the ability to process delayed commands. In other words, you can save a document with a refresh and have it processed by a subscription at a given time.

Expanding on this, you can do the same using ETL. So you have a document that will be sent over to the ETL destination at a given time. You can also do the same for indexing as well.

And now this seemingly trivial / useless feature become a pivot for a whole new set of capabilities that you get with RavenDB.

time to read 4 min | 721 words

An extendible hash is composed of a directory section, which point to leaf pages, and the leaf pages, where the actual data resides.

My current implementation is incredibly naïve, though. The leaf page has an array of records (two int64 values, for key & value) that we scan through to find a match. That works, but it works badly. It means that you have to scan through a lot of records (up to 254, if the value isn’t found). At the same time, we also waste quite a bit of space here. We need to store int64 keys and values, but the vast majority of them are going to be much smaller. 

Voron uses 8KB pages, with 64 bytes page header, leaving us with 8,128 bytes for actual data. I want to kill two birds in one design decision here. Handling both the lookup costs as well as the space costs. Another factor that I have to keep in mind is that complexity kills.

The RavenDB project had a number of dedicated hash tables over the years, for specific purposes. Some of them were quite fancy and optimized to the hilt. They were also complex. In a particular case, a particular set of writes and deletes meant that we lost an item in the hash table. It was there, but because we used linear probing on collision, and there was an issue with deleting a value under some cases where we would mark the first colliding key as removed, but didn’t move the second colliding key to its rightful place. If this make sense to you, you can appreciate the kind of bug that this caused. It only happened in very particular cases, very hard to track down and caused nasty issues. If you don’t follow the issue description, just assume that it was complex and hard to figure out. I don’t like complexity, this is part of why I enjoy extendible hashing so much, it is such a brilliant idea, and so simple in concept.

So whatever method we are talking about, it has to allow fast lookups, be space efficient and not be complex.

The idea is simple, we’ll divide the space available in our page to 127 pieces, each with 64 bytes in size. The first byte on each piece will be used to hold the number of entries used in this piece, and the rest of the data will hold varint encoded pairs of keys and values. It might be easier to understand if we’ll look at the code:

I’m showing here the read side of things, because that is pretty simple. For writes, you need to do a bit more house keeping, but not a lot more of it. Some key observations about this piece of code:

  • We need to scan only a single 64 bytes buffer. This fits into a CPU cache line, so it is safe to assume that the cost of actually scanning it for a match is actually much lower than fetching from main memory.
  • We discard all the common prefix of the page, using its depth value. The rest is used as a modulus index directly into the specified location.
  • There is no complexity around linear probing, closed / open addressing, etc. This is because our system can’t have collisions.

Actually, the last part is a lie. You are going to get two values that end up in the same piece, obviously. That is a collision, but that require no other special handling. The fun part here is that when we fill a piece completely, we’ll need to split the whole page. That will automatically spread the data across two pages and we get our load factor for free Smile.

Analyzing the cost of lookup in such a scheme, we have:

  • Bit shifts to index into the right bucket (I’m assuming that the directory itself is fully resident in memory).
  • We also need the header of the bucket, so we’ll need to read it as well (Here we may have disk read).
  • Modulus constant and then direct addressing to the relevant piece (covered by the previous disk read).
  • Scan 64 bytes to find a particular key using varint.

So in all, we read about 192 bytes (counting values in cache lines) and a single disk read. I expect this to be a pretty efficient result, in both time and space.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. RavenDB 5.0 (2):
    21 Jan 2020 - Exploring Time Series–Part II
  2. Webinar (2):
    15 Jan 2020 - RavenDB’s unique features
  3. Challenges (2):
    03 Jan 2020 - Spot the bug in the stream–answer
  4. Challenge (55):
    02 Jan 2020 - Spot the bug in the stream
  5. re (26):
    27 Dec 2019 - Writing a very fast cache service with millions of entries
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats