Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,125 | Comments: 45,493

filter by tags archive

The role of logs

time to read 2 min | 344 words

You are probably familiar with logging, and log levels. In particular, the following scheme is very popular.

I have been using this scheme for pretty much every software project I had in over a decade. And it works. Which is good.

But I found that there are actually not really useful for our scenarios. In particular, we found that there are many cases where a component that logged something as a warn or error would generate a support ticket by eagle eyed admin, while it was just a valid part of the software behavior. The other side of that is that when we need to use the logs to find a problem, we never use fine grained logs, we just need it all.

This lead me to believe that we only actually need two and a half levels.

  • Informative – we are reporting so one of our support engineers or a team member can look at that and understand what the software is doing.
  • Operational – something deserves attention from the customer’s operations team.

There isn’t really anything in between those. Either this is something that we fully expect an operation team at the customer to look at, or this is something that the development and/or support engineers need to see.

But I mentioned two and a half, what is that half? Well, above operational, there is another half a level, which I like to call Pay Attention Now. This isn’t actually a log level, it is a way to attract someone’s attention to the log, or to some sort of operational alert. This is an SMS sent, or an email, etc.

But beyond the “something need attention now”, there is nothing else that is needed. The production logs are either for routine operations monitoring (in which case they should be relatively succinct and written with an eye to a reader who isn’t familiar with the internal working of RavenDB) or “what is going on here” mode, which is typically done by a team member who needs the most information possible.

Optimal compression rate

time to read 3 min | 459 words

In the new blittable format we use in RavenDB, we make heavy use of compression to reduce the size of documents.

Compression has an unfortunate problem, though. It doesn't work for all inputs. The proof of that is a very neat one:

  • We have a set of compression / decompression function: compress(plainText) –> compressedText and decompress(compressedText) –> plainText.
  • Those are lossless functions, that is, for any input decompress( compress(X) ) == X
  • Let us assume that for any input, size(plainText) > size(compressedText)

But that causes a problem. Let us assume that our plain text size is N, and that the compression algorithm reduce that size by just one bit, so the size of the compressedText is N-1.

We'll compress all possible permutations of N bits using this algorithm. Given that the compression results in at least N-1  bits, there must now be two different values of the plainText that result in the same compressedText. That breaks the ability to decompress them successfully. Because there isn't a one to one mapping between them. Common compression algorithms rely on the source data to either have repetitions (LZ derivatives) or are based on shared dictionaries that match a particular set of data.

In practice, this has real world implications when you are designing a data format. For example, the blittable format compress strings using two different algorithms. For large strings, we use LZ4, because it has much higher compression rate and doesn't require any special knowledge of the data. For small strings, we use a Smaz variants, which is a shared dictionary of common terms. Because the dictionary is small, we can't put a lot of data into it, so we concentrated on common Latin character sequences.

That means that if you are trying to compress a Unicode string like:

רוח צפונית

You are going to use up more bytes than the original plain text. This is easy to experiment with using Smaz variant, because it is very simple. But it also happens using LZ4 for certain inputs.

That causes a problem for the blittable format, because we want to compress the data, but for certain inputs, that means that we are going to get more data.

We solved that by doing conditional compression. We designate a buffer that is smaller than the plain text that we are compressing (this reflect the maximum amount of compression that is valuable for us), and compress to that buffer. If the compression routine was unable to compress to that buffer (because it needed more space), we fail the compression, and just store the plain text.

Now we have an optimal compression rate, this is going to always be equal to or (hopefully usually) smaller than the original text.

Dependencies management in a crisis

time to read 3 min | 480 words

Typically when people talk about dependencies they talk about how easy it is to version them, deploy them, change & replace them, etc. There seems to be a very deep focus on the costs of dependencies during development.

Today I want to talk about another aspect of that. The cost of dependencies when you have a crisis. In particular, consider the case of having a 2 AM support call that is rooted to one of your dependencies. What do you do then?

The customer see a problem in your software, so they call you, and you are asked to resolve it. After you narrowed the problem down to a particular dependency, you now need to check whatever this is your usage of the dependency that is broken, or whatever there is a genuine issue with the dependency.

Let us take a case in point with a recent support call we had. When running RavenDB on a Windows Cluster with both nodes sharing the same virtual IP, authentication doesn’t work. It took us a while to narrow it down to Windows authentication doesn’t work, and that is where we got stuck. Windows authentication is a wonderfully convenient  tool, but if there is an error, just finding out about it require specialized knowledge and skills. After verifying that our usage of the code looked correct, we ended up writing a minimal reproduction with about 20 lines of code, which also reproduced the issue.

At that point, we were able to escalate to Microsoft with the help of the customer. Apparently this is a Kerberos issue and you need to use NTLM and there was a workaround with some network configuration (check our docs if you really care about the details). But the key point here is that we would really have absolutely no way to figure it out on our own. Our usage of Windows authentication was according to the published best practices, but in this scenario you had to do something different to get it to work.

The point here is that if we weren’t able to escalate that to Microsoft, we would be in a pretty serious issue with the customer “we can’t fix this issue” is something that no one wants to hear.

As much as possible, we try to make sure that any dependencies that we take are either:

  • Stuff that we wrote and understand.
  • Open source* components that are well understood.
  • Have a support contract that we can fall back on, with the SLA we require.
  • Non essential / able to be disabled without major loss of functionality.

* Just taking OSS component from some GitHub repo is a bad idea. You need to be able to trust them, which means that you need to be sure that you can go into the code and either fix things or understand why they are broken.

Trie based routing

time to read 4 min | 764 words

In my previous post, I discussed the cost of routing in MVC vs. our own implementation. The MVC routing / infrastructure consumed about 90% of the time, while the code that actually does stuff is left with less than 10%. In our implementation, the actual request handling code gets about 85% of the time, while all the infrastructure is handled in the other 15%.

Of that , a large portion of that time is actually spent on routing. In our tests, we saw something like 40% of the time spent in selecting the right controller action to execute, this is when there is just one such action that can be run. Now, to be fair, this is a scenario where we don’t actually have any meaningful logic. We are just saying “hello world” over and over again, so the infrastructure costs are really noticeable in this benchmark. But that is the point, we want to see what are the fixed costs of the infrastructure we use.

Now, in RavenDB we have several hundreds routes (last count was something upward of 400), and we noticed that routing take a lot of time to select the right action to run. We optimized the hell out of that to the point by specializing that to our own situation and based on our own knowledge. But when we started to look at moving our code to Kestrel, we took the time to move to a system that is a better fit for us.

The major reason routing is such an expensive issue in MVC is that it does quite a lot. It needs to handle route selection, capturing values, coercing data to the right types and a lot more. All of that takes time, memory and cpu cycles. But as it turn out, we don’t really need all of that. We have relatively simple routing needs.

Here are a few example of RavenDB routes:

  • /admin/stats
  • /databases/{databaseName}/docs
  • /databases/{databaseName}/indexes/{*id}

So we have the fixed routes, like /admin/stats. We have routes that contain a database name in the middle, and we have routes with a suffix. We can take advantage of that to create an optimal solution.

In our case, we used a trie:

In computer science, a trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings.

The cost of querying a trie is O(L), where L is the length of the key that we are querying. Here is how a very small trie looks in our tests.


We start from the route, matching each character against the trie node key. Once we have consumed the full key, we check if it has any children starting with the next character in the url. Note that the Children array here is of size 127. This allows us to index directly into the array (at cost of O(1)) to find the next trie node. We didn’t want to use a dictionary here because of its size and its cost relative to an array access.

This has some interesting implications:

  • The route definition is limited to ASCII characters (but the actual urls are not, mind).
  • Regardless of the number of routes, we are ensured of reliable performance.

But that still doesn’t let us handle routes like the ones above. In order to do that, we had to extend the syntax a bit.

/admin/stats would match exactly. That is easiest.

/databases/*/docs is a route that has a capture. The * symbol says “matches anything but /”, so we can get the database name captured without complex matching. Note that by capture, I mean just record the start position and length, we don’t want to pull a substring out of that and pay for an extra allocation.

/databases/*/indexes/$ is another route that has special meaning. The $ symbol says “the match is done, even if the url has additional characters”. That way we can capture embedded ids easily (again, no allocations).

The end result?


Or roughly a route match in 0.9 μs. Pretty good, even if I say so myself.

Playing with key generators, redux

time to read 9 min | 1629 words

In my previous post, I have discussed how using Smaz-like compression, I can achieve a smaller (and more professional looking) license key.

Here is the original license:

  "id": "cd6fff02-2aff-4fae-bc76-8abf1e673d3b",
  "expiration": "2017-01-17T00:00:00.0000000",
  "type": "Subscription",
  "version": "3.0",
  "maxRamUtilization": "12884901888",
  "maxParallelism": "6",
  "allowWindowsClustering": "false",
  "OEM": "false",
  "numberOfDatabases": "unlimited",
  "fips": "false",
  "periodicBackup": "true",
  "quotas": "false",
  "authorization": "true",
  "documentExpiration": "true",
  "replication": "true",
  "versioning": "true",
  "maxSizeInMb": "unlimited",
  "ravenfs": "true",
  "encryption": "true",
  "compression": "false",
  "updatesExpiration": "2017-Jan-17",
  "name": "Hibernating Rhinos"

And here is the resulting output:

0172020007 93050A0B28 0D0D0D682C 24080D0D2C
260D080C2C 090A29282C 2A08090D23 0C2829250B
2509060C1F 171019081B 1016150587 2C672C7795
5422220322 2203222295 2E22222222 2222220692
058F064407 4A0537064B 0528064C8C 4D8C4E0549
065A8C4F8D 528C538D54 8D558D568D 5705490659
8D508D518C 5805872C5B 2C77069105 954810090C
1915081B10 150E962052 0F1015161A 069553100E
15081B1C19 0C05954511 740C1C984D 4B4641270B
95560F1608 209841492F 4108962B58 0B9556120C
95590C954C 2195441695 5623954C29 2695482009
1D1C97514D 2B1117974E 492B150E96 3D3D070157
A9E859DD06 1EE0EC6210 674ED4CA88 C78FC61D20
B1650BF992 978871264B 57994E0CF3 EA99BFE9G1

It occurred to me that I was approaching it in completely the wrong direction. Instead of trying to specialize a generic solution, why not actually create a dedicated solution.

I started by defining the license terms:

public static string[] Terms =

Then I observed that I really only had five date types: boolean, int, date, guid and string. And that I only had (right now) 21 terms to work with. String is problematic, and we only ever use it for either the name on the license or for enum values (such as the type of the license). We can switch to using numeric values for enums, and we'll ignore the name for now. The dates we have are just that, dates, we don't need timing information. We also have a relatively small range of valid dates. From now to (lets be generous) 100 years from now. Finally, our integers are mostly very small. Version numbers and the like. All except the maxRamUtilization, which is in bytes. I switch that one to GB, which gives us small numbers again. We also have values that are numeric, but can have the string "unlimited", we'll use value 0 as the magic value to say unlimited.

What is the point of all of that the data we need to store is:

  • Booleans
  • Integers
  • Dates

Since we want to conserve space, we'll limit the integers to byte size (pun intentional Smile) and we'll store the dates in DOS format:


This means that we can pack a date (but not time, mind) into two bytes.

So the format is going to be pretty simple:

[index into the terms, type of value, value]

However, since we have so few value types, we can do better and actually store them as bits.

So the first 5 bits in the token would be the terms index (which limits us to 32 terms, since we currently only have 21, I'm fine with that) and the last 3 bits are used for the type of the value. I'm using the following types:


Note that we have two definitions of boolean value, true & false. The idea is that we can use a single byte to both index into the terms and specify what the actual value is. Since a lot of our properties are boolean, that saves quite a lot of space.

The rest of the format is pretty easy, and the whole thing can be done with the following code:

 var ms = new MemoryStream();
 var bw = new BinaryWriter(ms);
 foreach (var attribute in attributes)
     if (attribute.Value == null)
         throw new InvalidOperationException("Cannot write a null value");

     var index = Array.IndexOf(Terms, attribute.Key);
     if (index == -1)
         throw new InvalidOperationException("Unknown term " + attribute.Key);

     if (attribute.Value is bool)
         var type = (byte)((bool)attribute.Value ? ValueType.True : ValueType.False) << TypeBitsToShift;
         bw.Write((byte)((byte)index | type));
     if (attribute.Value is DateTime)
         bw.Write((byte)((byte)index | ((byte)ValueType.Date << TypeBitsToShift)));
         var dt = (DateTime)(attribute.Value);
     if (attribute.Value is int || attribute.Value is long)
         var val = Convert.ToByte(attribute.Value);
         bw.Write((byte)((byte)index | ((byte)ValueType.Int << TypeBitsToShift)));

     throw new InvalidOperationException("Cannot understand type of " + attribute.Key + " because it is " + attribute.Value.GetType().FullName);

Nothing truly interesting, I'll admit. But it means that I can pack this:

  "expiration": "2017-01-17T00:00:00",
  "type": 1,
  "version": 3,
  "maxRamUtilization": 12,
  "maxParallelism": 6,
  "allowWindowsClustering": false,
  "OEM": false,
  "numberOfDatabases": 0,
  "fips": false,
  "periodicBackup": true,
  "quotas": false,
  "authorization": true,
  "documentExpiration": true,
  "replication": true,
  "versioning": true,
  "maxSizeInGb": 0,
  "ravenfs": true,
  "encryption": true,
  "compression": false,
  "updatesExpiration": "2017-01-17T00:00:00"

Into 30 bytes.

Those of you with sharp eyes might have noticed that we dropped two fairly important fields. The license id and the name for whom the license is issued to. Let us deal with the name first.

Instead of encoding the name inside the license, we can send it separately. The user will have two fields to enter, the name and the actual license key.  But the 30 bytes we compacted the license attributes into aren't really useful. Anyone can generate them, after all. What we need it to sign them, and we do that using DSA public key.

Basically, we take the license attributes that we built, concat them with the name's bytes, and then generate a digital signature for that. Then we just add that to the license. Since DSA signature is 40 bytes in size, it means that our license has ballooned into whooping 70 bytes.

Using base 64, we get the following license key:

Hibernating Rhinos


And now that looks much more reasonable. This also explains why we dropped the license id. We don't need it anymore. The license itself (short & compact) gives us as good a way to refer to the license as the license id used to be, and it isn't much longer.

For clarity's sake it might be clearer to understand if we split this into separate fields:

  "Name": "Hibernating Rhinos",
  "Signature": "LAfgqs3MPzfRERCY+DWjZoso95lh+AzmOdt2+fC+p2TgC16hWKDESw=="

I'll admit that I went a bit overboard here and started doing all sort of crazy things here. For example, here is another representation of the same license scheme:


For mobile developers, I think that this would be an excellent way to enter the product registration Smile.

And here is the code, if you want to go deep: https://gist.github.com/ayende/6ecb9e2f4efb95dd98a0

Playing with key generators

time to read 5 min | 867 words

For the past 6 years or so, Hibernating Rhinos is using a licensing component that I wrote after being burned by trying to use a commercial licensing component. This licensing scheme is based around signed XML files that we send to customers.

It works, but it has a few issues. Sending files to customers turn out to be incredibly tricky. A lot of companies out there will flat out reject any email that has an attachment. So we ended up uploading the files to S3, and sending them a link to it. Dealing with files in this manner also means that there is a lot of relatively manual steps (take the file, rename it, place it in this dir, etc). With all the attendant issues that they entail.

So we want something simpler.

Here is a sample of the kind of information that I need to pass in the license:


Note that this is a pure property bag, no structure beyond that.

An obvious solution for that is to throw it in JSON, and add a public key signature. This would look like this:

  "id": "cd6fff02-2aff-4fae-bc76-8abf1e673d3b",
  "expiration": "2017-01-17T00:00:00.0000000",
  "type": "Subscription",
  "version": "3.0",
  "maxRamUtilization": "12884901888",
  "maxParallelism": "6",
  "allowWindowsClustering": "false",
  "OEM": "false",
  "numberOfDatabases": "unlimited",
  "fips": "false",
  "periodicBackup": "true",
  "quotas": "false",
  "authorization": "true",
  "documentExpiration": "true",
  "replication": "true",
  "versioning": "true",
  "maxSizeInMb": "unlimited",
  "ravenfs": "true",
  "encryption": "true",
  "compression": "false",
  "updatesExpiration": "2017-Jan-17",
  "name": "Hibernating Rhinos",
  "Signature": "XorRcQEekjOOARwJwjK5Oi7QedKmXKdQWmO++DvrqBSEUMPqVX4Ahg=="

This has the advantage of being simple, text friendly, easy to work send over email, paste, etc. It is also human readable (which means that it is easy for a customer to look at the license and see what is specified.

But it also means that a customer might try to change it (you'll be surprised how often that happens), and be surprised when this fails. Other common issues include only pasting the "relevant" parts of the license, or pasting invalid JSON definition because of email client issues. And that doesn't include the journey some licenses go through. (They get pasted into Word, which adds smart quotes, into various systems, get sent around on various chat programs, etc). If you have a customer name that include Unicode, it is easy to lose that information, which will result in invalid license (won't match the signature).

There is also another issue, of professionalism. Not so much how you act, but how you appear to act. And something like the license above doesn't feel right to certain people. It is a deviation from the common practice in the industry, and can send a message about not caring too much about this. Which lead to wondering whatever you care about the rest of the application…

Anyway, I'm not sure that I like it, for all of those reasons. A simple way to avoid this is to just BASE64 the whole string.

What I want to have is something like this:


Note that this is just the same thing as above, with Base64 encoding. This is a bit big, and it gives me a chance to play with the Smaz library I made. The nice thing about the refactoring I made there is that we can provide our own custom term table.

In this case, the term table looks like this:


And with that, still using Base64, we get:


This is much smaller (about 30%). And it has the property that it won't be easily decoded by just throwing it into a Base64 decoder.

I decided to use plain hex encoding, and pretty format it, which gave me:

0172020007 93050A0B28 0D0D0D682C 24080D0D2C
260D080C2C 090A29282C 2A08090D23 0C2829250B
2509060C1F 171019081B 1016150587 2C672C7795
5422220322 2203222295 2E22222222 2222220692
058F064407 4A0537064B 0528064C8C 4D8C4E0549
065A8C4F8D 528C538D54 8D558D568D 5705490659
8D508D518C 5805872C5B 2C77069105 954810090C
1915081B10 150E962052 0F1015161A 069553100E
15081B1C19 0C05954511 740C1C984D 4B4641270B
95560F1608 209841492F 4108962B58 0B9556120C
95590C954C 2195441695 5623954C29 2695482009
1D1C97514D 2B1117974E 492B150E96 3D3D070157
A9E859DD06 1EE0EC6210 674ED4CA88 C78FC61D20
B1650BF992 978871264B 57994E0CF3 EA99BFE9G1

This make it look much better, I think. Even if it almost double the amount of space we take (we are still smaller than the original JSON, though). Mostly this is me playing around on Friday's evening, to be honest. I needed something small to play with that had immediate feedback, so I started playing with this.

What do you think?

Good, fast, pretty code: How to choose?

time to read 3 min | 506 words

I have a piece of code that does something on types. It is a whole complex thing that does a lot of stuff. And the code is really ugly, here is a small part from ~180 lines method.


The problem would have been much simpler if we could only switch on types, which is effectively what I want to do here. As it stands, however, the JIT is actually going to replace all those if statements with pointer comparisons to the method table, so this is pretty fast. Unpleasant to read, but really fast.

I decided to see what it would take to create a more readable version:

public class TypeCounter
    public int Long;

    public Dictionary<Type, Action<object>> Actions;

    public TypeCounter()
        Actions = new Dictionary<Type, Action<object>>
            [typeof(long)] = o => Long++,
            [typeof(int)] = o => Long++,
    public void Dic(object instance)
        if (instance == null)
        Action<object> value;
        if (Actions.TryGetValue(instance.GetType(), out value))

    public void Explicit(object instance)
        if (instance is int)
        else if (instance is long)

This is just a simpler case of the code above, focused on just the two options. The dictionary version is much more readable, especially once you get to more than a couple of types. The problem, I have tested this on both this two types option as well as 10 types, and in all cases, the explicit version is about twice as fast.

So this rules it out, and we are left with the sad looking code that can run.

The importance of a data formatPart VII–Final benchmarks

time to read 4 min | 800 words

After quite a bit of work, we have finished (at least for now) the Blittable format. As you can probably guess from the previous posts, we have decided to also implement small string compression in the Blittable format, based on the Smaz algorithm.

We also noticed that there are two distinct use cases for the Blittable format. The first is when we actually want to end up with the document on disk. In this case, we would like to trade CPU time for I/O time (especially since usually we write once, but read a lot, so we'll keep paying that I/O cost). But there are also many cases in which we read JSON from the network jus to do something with it. For example, if we are going to read an index definition (which is sent as a JSON document), there is no need to compress it, we'll just have to immediately uncompress it again.

So we added the option to choose what mode you'll use the data. Here are the benchmarks for speed and size for Json, Blittable (for disk) and Blittable (for memory)

image image

As expected, we see a major difference in performance between Blit to disk and Blit to mem. Blit to mem is comparable or better from Json in nearly all scenarios, while Blit to disk is significantly smaller than Json while having comparable or better performance at parsing time, even though it is compressing a lot of the data.

For large documents, this is even more impressive:

image image

Blit to mem is 40% – 75% of the speed of Json at parsing, while Blit to disk can be much more expensive (us to x2.5 times more than Json), depending on how compressible the data is. On the other hand, Blit to mem is already smaller than Json in 33% – 20% for most datasets, but Blit to disk improves on that significantly, often by as much as 60% of the original JSON. In the companies.json case, the original file size is 67.25 MB, using Blit to mem we have a 45.29 MB file, and using Blit to disk gives us a 40.97MB.

This is a huge reduction in size (compared to appreciable fraction of a second on most I/O systems).

So far, we look at the big boys, but what happens if we try this on a large number of small documents? For example, as would commonly happen during indexing?

In this case, we see the following:


Note that Blit to disk is very expensive in this mode, this is because we actually do a lot. See the next chart:


Blit to disk is actually compressing the data by close to 20MB, and it is 5 MB smaller than Blit to memory.

The question on what exactly is the difference between Blit to memory and Blit to disk came up in our internal mailing list. The actual format is identical, there are currently only two differences:

  • Blit to disk will try to compress all strings as much as possible, to reduce I/O. This is often at the cost of parsing time. Blit to mem just store the strings as is, which is much faster, obviously. It takes more memory, but we generally have a lot of that available, and when using Blit to memory, we typically work with small documents, anyway.
  • Blit to disk also does additional validations (for example, that a double value is a valid double), while Blit to memory skip those. The assumption is that if the double value isn’t valid, it will fail when you actually access the double’s value, whereas we want to have such issues happen when we persist the data if we are actually going to access it later when using Blit to disk.

Overall, a pretty good job all around.

Static dictionary vs. dynamic dictionary tryouts

time to read 2 min | 291 words

In the previous post, I discussed the way the Smaz library works. Smaz uses a static shared dictionary, that was produced once from a known training set. FemtoZip, on the other hand, also uses a shared dictionary, but it is using a much more sophisticated system. To start with, it allows you to create the shared dictionary from your own data, not just the fixed data such as Smaz. However, that does require you to carefully pick your training set.

And of course, once you have picked a training set, it is very hard if not impossible to change without complex versioning rules. I would have expected the FemtoZip version to compress much better, but it depend heavily on the training set it has, and selecting the appropriate one is relatively hard in a general purpose manner.

In my tests, I wasn't able to find a training set that would do better than Smaz, I tried a bunch, including using Smaz's own dictionary, but it seems like Smaz has done a pretty good job in selecting the relevant outputs. Or perhaps the strings I was testing were optimal for Smaz, I'm not sure.

I also looked at Shoco, but it took very little time to rule it out. It only supports ASCII strings, which is about as far from helpful as you can get, in this day and age. Smaz isn't doing too great on non ASCII characters, but it has a fixed growth per length of unfamiliar terms, which is much better than doubling the size as in Shoco.

For our needs, I believe that we'll need to have a static version of the dictionary anyway, so there isn't a major benefit to being able to train them.

UX Tricks perceived responsiveness for high latency operations / failure modes

time to read 4 min | 725 words

We recently had to deal with an interesting problem in an unreliable application. To make it simple, I'll use this blog as an example, since that is much easier.

Let us assume that you are reading this post, and have a sudden desire to post a comment. Now, you can do so, we'll refresh the page, and you can see your new comment. So far, pretty simple and easy. That is the way we have been writing web applications since the nineties.

However, let us say that for whatever reason, we decided to put the actual content of the page inside a CDN, like Amazon CloudFront. The minimum amount of time we can store a page in a CDN is 5 minutes or so. The reason we want to use a CDN is so users' won't be impacted when the site is down for whatever reason.

Now, there are two types of user interactions in the blog. There is me, writing articles, and posting them. And then there is the reader, who read the post, and then send a scathing comment about the post. The software in question suffers from occasional outages, so we needed to have a way to cover that without the users' noticing. In this case, we divided the operations into two categories. Rare, which require the software to be fully operational (in the example above, posting a new blog post) and common, such as a user posting a comment, which should work even in reduced availability mode.

The solution was to utilize the CDN both for caching and reliability. The blog post itself is going to be hosted on a CDN, so if the core software fail, the user doesn't see that. But what about the actual interactions? As it turns out, while we want the user to think that everything is fine, we are actually okay with delaying some things. In this case, the perceived operation is actually more important than the actual operation.

Let us consider the standard system behavior, when there is no failure. The user goes to a particular page, and get the cached results from CDN. At the same time, we contact the live site, to try to get anything that is happening after the cached version. In the blog post example, the Ajax call would be something like "give me all the comments after 123". This takes us half way there, even though we use a CDN, we now have all the updated data when the user loads the page.

Except that we can take it further. Instead of doing an Ajax call, we can open a web socket, and ask the server "stream me everything after comment 123". As new comments come in, they will show up on the page automatically. If there is an outage, the user is going to see the latest version cached in the CDN, and won't be aware that they aren't getting the new comments. When the site is up again, we'll have all the new comments show up, just like magic.

This takes care of the user's reading habits. But what happens if the user is actually commenting? In the good case, the system is up and running, accept the new comment, distribute it to all the web sockets listening to the page's comment, and everything si good. But what happens in the failure case?

In this case, we are going to accept the user's comment, and then write it to local storage. And from local storage, we will display it to the user so as far as the user is concerned, the comment was successfully posted. Naturally we need to retry on the background, and eventually we'll send the comment to the server successfully.

And yes, this is only relevant for cases where we can actually afford to have the user believe that an operation has been successful when it hasn't been. But for stuff like comments, this is a really good system to make sure that everyone has a good user experience.

Oh, and this also has the benefit of having an easy way to handle abuse. You can not send the comment to the server, but do show it to the user. This way the spammer think that they have successfully posted, but they are just spamming themselves.


  1. RavenDB 3.5 whirl wind tour: I'll have the 3+1 goodies to go, please - 3 days from now
  2. The design of RavenDB 4.0: Voron has a one track mind - 4 days from now
  3. RavenDB 3.5 whirl wind tour: Digging deep into the internals - 5 days from now
  4. The design of RavenDB 4.0: Separation of indexes and documents - 6 days from now
  5. RavenDB 3.5 whirl wind tour: Deeper insights to indexing - 7 days from now

And 10 more posts are pending...

There are posts all the way to May 30, 2016


  1. The design of RavenDB 4.0 (14):
    05 May 2016 - Physically segregating collections
  2. RavenDB 3.5 whirl wind tour (14):
    04 May 2016 - I’ll find who is taking my I/O bandwidth and they SHALL pay
  3. Tasks for the new comer (2):
    15 Apr 2016 - Quartz.NET with RavenDB
  4. Code through the looking glass (5):
    18 Mar 2016 - And a linear search to rule them
  5. Find the bug (8):
    29 Feb 2016 - When you can't rely on your own identity
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats