Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,511
|
Comments: 51,111
Privacy Policy · Terms
filter by tags archive
time to read 13 min | 2474 words

In my previous post, I explained what we are trying to do. Create a way to carry a dictionary between transactions in RavenDB, allowing one write transaction to modify it while all other read transactions only observe the state of the dictionary as it was at the publication time.

I want to show a couple of ways I tried solving this problem using the built-in tools in the Base Class Library. Here is roughly what I’m trying to do:


IEnumerable<object> SingleDictionary()
{
    var dic = new Dictionary<long, object>();
    var random = new Random(932);
    var v = new object();
    // number of transactions
    for (var txCount = 0; txCount < 1000; txCount++)
    {
        // operations in transaction
        for (int opCount = 0; opCount < 10_000; opCount++)
        {
            dic[random.NextInt64(0, 1024 * 1024 * 1024)] = v;
        }
        yield return dic;// publish the dictionary
    }
}

As you can see, we are running a thousand transactions, each of which performs 10,000 operations. We “publish” the state of the transaction after each time.

This is just to set up a baseline for what I’m trying to do. I’m focusing solely on this one aspect of the table that is published. Note that I cannot actually use this particular code. The issue is that the dictionary is both mutable and shared (across threads), I cannot do that.

The easiest way to go about this is to just clone the dictionary. Here is what this would look like:


IEnumerable<object> ClonedDictionary()
{
    var dic = new Dictionary<long, object>();
    var random = new Random(932);
    var v = new object();
    // number of transactions
    for (var txCount = 0; txCount < 1000; txCount++)
    {
        // operations in transaction
        for (int opCount = 0; opCount < 10_000; opCount++)
        {
            dic[random.NextInt64(0, 1024 * 1024 * 1024)] = v;
        }
       // publish the dictionary
        yield return new Dictionary<long, object>(dic);
    }
}

This is basically the same code, but when I publish the dictionary, I’m going to create a new instance (which will be read-only). This is exactly what I want: to have a cloned, read-only copy that the read transactions can use while I get to keep on modifying the write copy.

The downside of this approach is twofold. First, there are a lot of allocations because of this, and the more items in the table, the more expensive it is to copy.

I can try using the ImmutableDictionary in the Base Class Library, however. Here is what this would look like:


IEnumerable<object> ClonedImmutableDictionary()
{
    var dic = ImmutableDictionary.Create<long, object>();


    var random = new Random(932);
    var v = new object();
    // number of transactions
    for (var txCount = 0; txCount < 1000; txCount++) 
    {
        // operations in transaction
        for (int opCount = 0; opCount < 10_000; opCount++) 
        {
            dic = dic.Add(random.NextInt64(0, 1024 * 1024 * 1024), v);
        }
        // publish the dictionary
        yield return dic;
    }
}

The benefit here is that the act of publishing is effectively a no-op. Just send the immutable value out to the world. The downside of using immutable dictionaries is that each operation involves an allocation, and the actual underlying implementation is far less efficient as a hash table than the regular dictionary.

I can try to optimize this a bit by using the builder pattern, as shown here:


IEnumerable<object> BuilderImmutableDictionary()
{
    var builder = ImmutableDictionary.CreateBuilder<long, object>();


    var random = new Random(932);
    var v = new object(); ;
    // number of transactions
    for (var txCount = 0; txCount < 1000; txCount++)
    {
        // operations in transaction
        for (int opCount = 0; opCount < 10_000; opCount++)
        {
            builder[random.NextInt64(0, 1024 * 1024 * 1024)] = v;
        }
        // publish the dictionary
        yield return builder.ToImmutable();
    }
}

Now we only pay the immutable cost one per transaction, right? However, the underlying implementation is still an AVL tree, not a proper hash table. This means that not only is it more expensive for publishing the state, but we are now slower for reads as well. That is not something that we want.

The BCL recently introduced a FrozenDictionary, which is meant to be super efficient for a really common case of dictionaries that are accessed a lot but rarely written to. I delved into its implementation and was impressed by the amount of work invested into ensuring that this will be really fast.

Let’s see how that would look like for our scenario, shall we?


IEnumerable<object> FrozenDictionary()
{
    var dic = new Dictionary<long, object>();
    var random = new Random(932);
    var v = new object();
    // number of transactions
    for (var txCount = 0; txCount < 1000; txCount++)
    {
        // operations in transaction
        for (int opCount = 0; opCount < 10_000; opCount++)
        {
            dic[random.NextInt64(0, 1024 * 1024 * 1024)] = v;
        }
        // publish the dictionary
        yield return dic.ToFrozenDictionary();
    }
}

The good thing is that we are using a standard dictionary on the write side and publishing it once per transaction. The downside is that we need to pay a cost to create the frozen dictionary that is proportional to the number of items in the dictionary. That can get expensive fast.

After seeing all of those options, let’s check the numbers. The full code is in this gist.

I executed all of those using Benchmark.NET, let’s see the results.

MethodMeanRatio
SingleDictionaryBench7.768 ms1.00
BuilderImmutableDictionaryBench122.508 ms15.82
ClonedImmutableDictionaryBench176.041 ms21.95
ClonedDictionaryBench1,489.614 ms195.04
FrozenDictionaryBench6,279.542 ms807.36
ImmutableDictionaryFromDicBench46,906.047 ms6,029.69

Note that the difference in speed is absolutely staggering. The SingleDictionaryBench is a bad example. It is just filling a dictionary directly, with no additional cost. The cost for the BuilderImmutableDictionaryBench is more reasonable, given what it has to do.

Just looking at the benchmark result isn’t sufficient. I implemented every one of those options in RavenDB and ran them under a profiler. The results are quite interesting.

Here is the version I started with, using a frozen dictionary. That is the right data structure for what I want. I have one thread that is mutating data, then publish the frozen results for others to use.

However, take a look at the profiler results! Don’t focus on the duration values, look at the percentage of time spent creating the frozen dictionary. That is 60%(!) of the total transaction time. That is… an absolutely insane number.

Note that it is clear that the frozen dictionary isn’t suitable for our needs here. The ratio between reading and writing isn’t sufficient to justify the cost. One of the benefits of FrozenDictionary is that it is more expensive to create than normal since it is trying hard to optimize for reading performance.

What about the ImmutableDictionary? Well, that is a complete non-starter. It is taking close to 90%(!!) of the total transaction runtime. I know that I called the frozen numbers insane, I should have chosen something else, because now I have no words to describe this.

Remember that one problem here is that we cannot just use the regular dictionary or a concurrent dictionary. We need to have a fixed state of the dictionary when we publish it. What if we use a normal dictionary, cloned?

This is far better, at about 40%, instead of 60% or 90%.

You have to understand, better doesn’t mean good. Spending those numbers on just publishing the state of the transaction is beyond ridiculous.

We need to find another way to do this. Remember where we started? The PageTable in RavenDB that currently handles this is really complex.

I looked into my records and found this blog post from over a decade ago, discussing this exact problem. It certainly looks like this complexity is at least semi-justified.

I still want to be able to fix this… but it won’t be as easy as reaching out to a built-in type in the BCL, it seems.

time to read 4 min | 778 words

At the heart of RavenDB, there is a data structure that we call the Page Translation Table. It is one of the most important pieces inside RavenDB.

The page translation table is basically a Dictionary<long, Page>, mapping between a page number and the actual page. The critical aspect of this data structure is that it is both concurrent and multi-version. That is, at a single point, there may be multiple versions of the table, representing different versions of the table at given points in time.

The way it works, a transaction in RavenDB generates a page translation table as part of its execution and publishes the table on commit. However, each subsequent table builds upon the previous one, so things become more complex. Here is a usage example (in Python pseudo-code):


table = {}


with wtx1 = write_tx(table):
  wtx1.put(2, 'v1')
  wtx1.put(3, 'v1')
  wtx1.publish(table)


# table has (2 => v1, 3 => v1)


with wtx2 = write_tx(table):
  wtx2.put(2, 'v2')
  wtx2.put(4, 'v2')
  wtx2.publish(table)


# table has (2 => v2, 3 => v1, 4 => v2)

This is pretty easy to follow, I think. The table is a simple hash table at this point in time.

The catch is when we mix read transactions as well, like so:


# table has (2 => v2, 3 => v1, 4 => v2)


with rtx1 = read_tx(table):


        with wtx3 = write_tx(table):
                wtx3.put(2, 'v3')
                wtx3.put(3, 'v3')
                wtx3.put(5, 'v3')


                with rtx2 = read_tx(table):
                        rtx2.read(2) # => gives, v2
                        rtx2.read(3) # => gives, v1
                        rtx2.read(5) # => gives, None


                wtx3.publish(table)


# table has (2 => v3, 3 => v3, 4 => v2, 5 => v3)
# but rtx2 still observe the value as they were when
# rtx2 was created


        rtx2.read(2) # => gives, v2
        rtx2.read(3) # => gives, v1
        rtx2.read(5) # => gives, None

In other words, until we publish a transaction, its changes don’t take effect. And any read translation that was already started isn’t impacted. We also need this to be concurrent, so we can use the table in multiple threads (a single write transaction at a time, but potentially many read transactions). Each transaction may modify hundreds or thousands of pages, and we’ll only clear the table of old values once in a while (so it isn’t infinite growth, but may certainly reach respectable numbers of items).

The implementation we have inside of RavenDB for this is complex! I tried drawing that on the whiteboard to explain what was going on, and I needed both the third and fourth dimensions to illustrate the concept.

Given these requirements, how would you implement this sort of data structure?

time to read 3 min | 440 words

We recently published an article on Getting started with GraphQL and RavenDB, it will walk you through setting up Hot Chocolate to create a RavenDB-based GraphQL endpoint in your system.

Here is what this looks like:


Another new feature is the New Database Wizard, which was completely redesigned and made much simpler. We have a great number of features & options, and the idea is that we want to give you better insight into what you can do with your system.

Here is a quick peek, but take a look at the link. I’m biased, of course, but I think the team did a really great job in exposing a really complex set of options in a very clear manner.


The About Page of RavenDB Studio has undergone major updates. In particular, we have made it easier to see that new versions are available, what changes are included in each version, and check for updates. Additionally, you can now review your license details and compare features available in different editions.


Furthermore, we pushed an update to the About Page of RavenDBitself. Here we try to tell the story of RavenDB and how it came about. Beyond the narrative, our goal is to explain the design philosophy of RavenDB.

The idea is quite simple, we aim to be the database that you don’t have to think about. The one component in your system that you don’t need to worry about, the cog that just keeps on working. If we do our job right, we are a very boring database. Amusingly enough, the actual story behind it is quite interesting, and I would love to get your feedback on it.


We also published our new roadmap, which has a bunch of new goodies for you. I’m quite excited about this, and I hope you’ll too (in a boring manner 🙂). Upcoming features include data governance features, Open Telemetry integration, extremely large clusters, and more.

One of our stated goals in the roadmap is better performance, with a focus on ARM hardware, which is taking the server / cloud world by storm. To that end, we are performing many “crimes against code” to squeeze every last erg of performance from the system.

Initial results are promising, but we still have some way to go before we can publicly disclose numbers.

As usual, I would appreciate your feedback about the roadmap and the new features in general.

time to read 1 min | 129 words

Watch Oren Eini, CEO of RavenDB, as he delves into the intricate process of constructing a database engine using C# and .NET. Uncover the unique features that make C# a robust system language for high-end system development. Learn how C# provides direct memory access and fine-grained control, enabling developers to seamlessly blend high-level concepts with intimate control over system operations within a single project. Embark on the journey of leveraging the power of C# and .NET to craft a potent and efficient database engine, unlocking new possibilities in system development.

I’m going deep into some of the cool stuff that you can do with C# and low level programming.

time to read 2 min | 326 words

Take a look at this wonderful example of foresightedness (or hubris).

In a little over ten years, Let’s Encrypt root certificates are going to expire. There are already established procedures for how to handle this from other Certificate Authorities, and I assume that there will be a well-communicated plan for this in advance.

That said, I’m writing this blog post primarily because I want to put the URL in the notes for the meeting above. Because in 10 years, I’m pretty certain that I won’t be able to recall why this is such a concerning event for us.

RavenDB uses certificates for authentication, usually generated via Let’s Encrypt. Since those certificates expire every 3 months, they are continuously replaced. When we talk about trust between different RavenDB instances, that can cause a problem. If the certificate changes every 3 months, how can I trust it?

RavenDB trusts a certificate directly, as well as any later version of that certificate assuming that the leaf certificate has the same key and that they have at least one shared signer. That is to handle the scenario where you replace the intermediate certificate (you can go up to the root certificate for trust at that point).

Depending on the exact manner in which the root certificate will be replaced, we need to verify that RavenDB is properly handling this update process. This meeting is set for over a year before the due date, which should give us more than enough time to handle this.

Right now, if they are using the same key on the new root certificate, it will just work as expected. If they opt for cross-singing with another root certificate, we need to ensure that we can verify the signatures on both chains. That is hard to plan for because things change.

In short, future Oren, be sure to double-check this in time.

time to read 3 min | 487 words

Corax is the new indexing and querying engine in RavenDB, which recently came out with RavenDB 6.0. Our focus when building Corax was on one thing, performance. I did a full talk explaining how it works from the inside out, available here as well as a couple of podcasts.

Now that RavenDB 6.0 has been out for a while, we’ve had the chance to complete a few features that didn’t make the cut for the big 6.0 release. There is a host of small features for Corax, mostly completing tasks that were not included in the initial 6.0 release.

All these features are available in the 6.0.102 release, which went live in late April 2024.

The most important new feature for Corax is query plan visualization.

Let’s run the following query in the RavenDB Studio on the sample data set:


from index 'Orders/ByShipment/Location'
where spatial.within(ShipmentLocation, 
                  spatial.circle( 10, 49.255, 4.154, 'miles')
      )
and (Employee = 'employees/5-A' or Company = 'companies/85-A')
order by Company, score()
include timings()

Note that we are using the includetimings() feature. If you configure this index to use Corax, issuing the above query will also give us the full query plan. In this case, you can see it here:

You can see exactly how the query engine has processed your query and the pipeline it has gone through.

We have incorporated many additional features into Corax, including phrase queries, scoring based on spatial results, and more complex sorting pipelines. For the most part, those are small but they fulfill specific needs and enable a wider range of scenarios for Corax.

Over six months since Corax went live with 6.0, I can tell that it has been a successful feature. It performs its primary job well, being a faster and more efficient querying engine. And the best part is that it isn’t even something that you need to be aware of.

Corax has been the default indexing engine for the Development and Community editions of RavenDB for over 3 months now, and almost no one has noticed.

It’s a strange metric, I know, for a feature to be successful when no one is even aware of its existence, but that is a common theme for RavenDB. The whole point behind RavenDB is to provide a database that works, allowing you to forget about it.

time to read 1 min | 103 words

A couple of months ago I had the joy of giving an internal lecture to our developer group about Voron, RavenDB’s dedicated storage engine. In the lecture, I’m going over the design and implementation of our storage engine.

If you ever had an interest on how RavenDB’s transactional and high performance storage works, that is the lecture for you. Note that this is aimed at our developers, so we are going deep.

You can find the slides here and here is the full video.

time to read 3 min | 566 words

We got an interesting question in the RavenDB Discussion:

We have Polo (shirts) products. Some customers search for Polo and others search for Polos. The term Polos exists in only a few of the descriptions and marketing info so the results are different.

Is there a way to automatically generate singular and plural forms of a term or would I have to explicitly add those?

What is actually requested here is to perform a process known as stemming. Turning a word into its root. That is a core concept in full-text search, and RavenDB allows you to make use of that.

The idea is that during indexing and queries, RavenDB will transform the search terms into a common stem and search on that. Let’s look at how this works, shall we?

The first step is to make an index named Products/Search with the following definition:


from p in docs.Products
select new { p.Name }

That is about as simple an index as you can get, but we still need to configure the indexing of the Name field on the index, like so:

You can see that I customized the Name field and marked it for full-text search using the SnowballAnalyzer, which is responsible for properly stemming the terms.

However, if you try to create this index, you’ll get an error. By default, RavenDB doesn’t include the SnowballAnalyzer, but that isn’t going to stop us. This is because RavenDB allows users to define custom analyzers.

In the database “Settings”, go to “Custom Analyzers”:

And there you can  add a new analyzer. You can find the code for the analyzer in question in this Gist link.

You can also register analyzers by compiling them and placing the resulting DLLs in the RavenDB binaries directory. I find that having it as a single source file that we push to RavenDB in this manner is far cleaner.

Registering the analyzer via source means that you don’t need to worry about versioning, deploying to all the nodes in the cluster, or any such issues. It’s the responsibility of RavenDB to take care of this.

I produced the analyzer file by simply concatenating the relevant classes into a single file, basically creating a consolidated version containing everything required. That is usually done for C or C++ projects, but it is very useful in this case as well. Note that the analyzer in question must have a parameterless constructor. In this case, I just selected an English stemmer as the default one.

With the analyzer properly registered, we can create the index and start querying on it.

As you can see, we are able to find both plural and singular forms of the term we are searching for.

To make things even more interesting, this functionality is available with both Lucene and Corax indexes, as Corax is capable of consuming Lucene Analyzers.

The idea behind full-text search in RavenDB is that you have a full-blown indexing engine at your fingertips, but none of the complexity involved. At the same time, you can utilize advanced features without needing to move to another solution, everything is in a single box.

time to read 2 min | 270 words

RavenDB is typically accessed directly by your application, using an X509 certificate for authentication. The same applies when you are connecting to RavenDB as a user.

Many organizations require that user authentication will not use just a single factor (such as a password or a certificate) but multiple. RavenDB now supports the ability to define Two Factor Authentication for access.

Here is how this looks like in the RavenDB Studio:

You are able to generate a certificate as well as register the Authenticator code in your device.

When using the associated certificate, you’ll not be able to access RavenDB. Instead, you’ll get an error message saying that you need to complete the Two Factor Authentication process. Here is what that looks like:

Once you complete the two factor authentication process, you can select for how long we’ll allow access with the given certificate and whatever to allow just accesses from the current browser window (because you are accessing it directly) or from any client (you want to access RavenDB from another device or via code).

Once the session duration expires, you’ll need to provide the authentication code again, of course.

This feature is meant specifically for certificates that are used by people directly. It is not meant for APIs or programmatic access. Those should either have a manual step to allow the certificate or utilize a secrets manager that can have additional steps and validations based on your actual requirements.

You can read more about this feature in the feature announcement.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Challenge (75):
    01 Jul 2024 - Efficient snapshotable state
  2. Recording (14):
    19 Jun 2024 - Building a Database Engine in C# & .NET
  3. re (33):
    28 May 2024 - Secure Drop protocol
  4. Meta Blog (2):
    23 Jan 2024 - I'm a JS Developer now
  5. Production postmortem (51):
    12 Dec 2023 - The Spawn of Denial of Service
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}