Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,500
Comments: 51,069
Privacy Policy · Terms
filter by tags archive
time to read 4 min | 635 words

We got an error report from a customer about migration issue from 2.5 to 3.0. A particular document appear to have been corrupted, and caused issues.

We have an explicit endpoint to expose the database raw bytes to the client, so we can troubleshoot exactly those kind of errors. For fun, this is a compressed database, so the error was hiding beneath two level of indirection, but that is beside the point.

When looking at the raw document’s byte, we saw:


Which was… suspicious. It took a while to track down, but the end result was that this error would occur when you have:

  • A large string (over 1KB), and it is the first large string in the document.
  • At the 1023 position of the string (bytewise), you have a multi byte and multiple character value.

In those cases, we wouldn’t be able to read the document.

The underlying reason was an optimization we made in 3.0 to reduce buffer allocations during deserialization of documents. In order to properly handle that, we used an Encoding Decoder directly, without any intermediate buffers. This works great, except in this scenario, and the way JSON.Net calls us.

When JSON.Net find a large string, it will repeatedly read characters from the stream until it reached the end of the stream, and only then it will process it. If the string size is more than the buffer size, it will increase the buffer.

Let us imagine the following string:


When we serialize it, it looks like this:

var bytes = new byte[] { 65, 66, 67, 0xF0, 0x9F, 0x92, 0xA9 };

And let us say that we want to read that in a buffer of 4 characters. We’ll use it like so:

 int bytesPos = 0;
 int charsPos = 0;
 var chars = new char[4];
 while (bytesPos < bytes.Length) // read whole buffer
    while (charsPos < chars.Length) // process chars in chunks
         int bytesUsed;
         int charsUsed;
         bool completed;
         decoder.Convert(bytes, bytesPos, bytes.Length - bytesPos, chars, charsPos, chars.Length - charsPos, false,
             out bytesUsed, out charsUsed, out completed);
         bytesPos += bytesUsed;
         charsPos += charsUsed;
     Console.WriteLine(new string(chars));

On the first call, the Convert will convert the first three bytes into three characters, and stop. The JSON.Net code will then ask it to fill to the end of the buffer (simulated by the inner loop), but at that point, the Convert method will throw, because it has just one character available in the buffer to write to, but it can’t write that character.

Why is that? Look at the poo string above. How many character does it take?

If you answered four, you are correct visually, and wrong in buffer sense. This string actually takes 5 characters to represent. As I mentioned, in order to hit this error, we have to had a particular set of things align just right (or wrong). Even a single space difference would align things so no multi byte character would span the 1KB boundary.

The solution, by the way, was to drop the optimization, sadly, we’ll revisit this at a later time, probably, but now we’ll have a way to confirm that this scenario is also covered.

time to read 2 min | 244 words

Dave had an interesting comment about the previous post in this topic.

Actually I would have kept the original terms. Clarity is way more important than 'protecting' an click and point administrator. If an administrator is so incredible stupid to experiment with an production cluster, than it is his right!

To protect against accidental hits on the very big 'leave cluster' button, you can ask the admin to enter 3 digit number that is displayed to confirm the action. But leaving and joining an cluster are defacto industry terms which makes it easier for admins coming from other data storage solutions to get an handle on RavenDB.

I think that there is some confusion regarding the actual terms. Here is the current UI, after the changes I discussed in the previous post:


As you can see, we have “Add another server to cluster”, and “Leave cluster”, which are standard and common operations, they are what you’ll use in pretty much all cases.

The advanced cluster operations are unsafe, they are there to enable the operator to recover from a disaster that took the majority of the cluster down. Those aren’t standard operations, they are hidden by default under “advanced”, and even then we want to make sure that users are thinking about them.

time to read 2 min | 327 words

One of the features we are working on have the notion of a consensus cluster, as well as the ability to force a new cluster if a majority of the nodes in the cluster are down. The details aren’t important, but the first iteration of the UI went something like this:


Initialize new cluster is an unsafe operation, it make the current node into a single node cluster (which obviously has its own majority), and Take over a node will force a node that is part of an existing cluster to joint the current cluster, bypassing the usual safety measures.  The Leave cluster command is for usual behavior, when you want to safely remove a node from the cluster.

We had a few problems with this UI (note that it was there simply to make it easy to test the behavior of the system, so don’t get too hang up on the first draft).

One problem we had is that this is shown front and center. It isn’t an operation that we want to make it easy for the admin to run accidently (maybe through just exploring the interface).

That is easy, just drop it into an “Advanced” section, right? But I also had an issue with the terminology. It is too.. bland.

Instead, we are going to rename the buttons as follow:

  • Go AWOL from cluster – step down into a single node cluster.
  • Kidnap node into cluster – force a node to the current cluster.

The idea with this terminology is that it is obvious (hopefully) that those aren’t standard operations, and that you should consider them carefully.

I’m not sure about Go AWOL, because that might be a very US centric term, other things we consider are:

  • Abrogate cluster
  • Repudiate cluster

For the same logic.


time to read 1 min | 84 words

My wife complained that her laptop was running slow. We had a discussion that went something like this:

  • Me: Okay, I think I know what is going on, this laptop has a HD with 5,400 RPM.
  • Wife: …
  • Me:  …
  • Wife: …
  • Me: Okay, RPM is how fast the drive spins. The faster it spin, the faster it is. This is 5,400 RPM drive, and usually you want 7,200 RPM drive.
  • Wife: So go to the store and buy me another 2,000 RPM.
time to read 4 min | 654 words

We got an error in the following code, in production. We are trying hard to make sure that we have good errors, which allows us to troubleshoot things easily.

In this case, the code… wasn’t very helpful about it. Why? Take a look at the code, I’ll explain why below…

public CodecIndexInput(FileInfo file, Func<Stream, Stream> applyCodecs)
        this.file = file;
        this.applyCodecs = applyCodecs;

        fileHandle = Win32NativeFileMethods.CreateFile(file.FullName,
            Win32NativeFileShare.Read | Win32NativeFileShare.Write | Win32NativeFileShare.Delete,

        if (fileHandle.IsInvalid)
            const int ERROR_FILE_NOT_FOUND = 2;
            if (Marshal.GetLastWin32Error() == ERROR_FILE_NOT_FOUND)
                throw new FileNotFoundException(file.FullName);
            throw new Win32Exception();

        mmf = Win32MemoryMapNativeMethods.CreateFileMapping(fileHandle.DangerousGetHandle(), IntPtr.Zero, Win32MemoryMapNativeMethods.FileMapProtection.PageReadonly,
            0, 0, null);
        if (mmf == IntPtr.Zero)
            throw new Win32Exception();

        basePtr = Win32MemoryMapNativeMethods.MapViewOfFileEx(mmf,
            0, 0, UIntPtr.Zero, null);
        if (basePtr == null)
            throw new Win32Exception();

        stream = applyCodecs(new MmapStream(basePtr, file.Length));
    catch (Exception)

Did you see it?

This code has multiple locations in which it can throw Win32Exception. The problem with that is that Win32Exception in this mode is pretty much just a code, and we have multiple locations inside this method that can thrown.

When that happens, if we don’t have the PDB files deployed, we have no way of knowing, just from the stack trace (without line numbers), which of the method calls had caused the error. That is going to lead to some confusion after the fact.

We solved this by adding description text for each of the options, including additional information that will let us know what is going on. In particular, we also included not only the operation that failed, but even more important, we included the file that failed.

time to read 2 min | 256 words

I was listening to the Programming Touchdown podcast, and on Episode 43, around the 13:00 minute mark, there was the following quote:

I can count on one, maybe two hands, the number of times in my entire career where I need to use… like the algorithm I used made an impactful difference. Most of the time, it doesn’t matter, architecture can matter, sometimes. Like, true, textbook algorithms.

This is roughly what it felt like to hear this…

I mean, seriously! Okay, I write database engines for a living. Part of my job is to look at academic literature to see if there are good ways to improve what we are doing. But you know what, let us say that building database engines is something special, and that your regular web developer doesn’t need any of it.

Let us see how right that is, shall we?

I’m going to use The Art of Computer Programming by Donald E. Knuth as an example here, because that certain match the definition of a text book. Reading just the table of contents:

  • Data structure - Stack, Queues, Linked Lists, Arrays, Binary Trees.
  • Concepts: Co-routines (async in C#, promises in JS, etc), Random Numbers, how floating points number actually work.
  • Sorting algorithms, searching sorted & unsorted data.

Saying that algorithms don’t matter is about taking decades of research and throwing it down the tube, because the machine is powerful enough for me to not notice that I’m brute forcing a solution at O(N**2)


  1. re: Secure Drop protocol - about one day from now

There are posts all the way to May 29, 2024


  1. re (33):
    16 Aug 2022 - How Discord supercharges network disks for extreme low latency
  2. Recording (13):
    05 Mar 2024 - Technology & Friends - Oren Eini on the Corax Search Engine
  3. Meta Blog (2):
    23 Jan 2024 - I'm a JS Developer now
  4. Production postmortem (51):
    12 Dec 2023 - The Spawn of Denial of Service
  5. Challenge (74):
    13 Oct 2023 - Fastest node selection metastable error state–answer
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats