Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,633
|
Comments: 51,252
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 469 words

I was talking with a developer about their system architecture and they mentioned that they are going through some complexity at the moment. They are changing their architecture to support higher scaling needs. Their current architecture is fairly simple (single app talking to a database), but in order to handle future growth, they are moving to a distributed micro service architecture. After talking with the dev for a while, I realized that they were in a particular industry that had a hard barrier for scale.

I’m not sure how much I can say, so let’s say that they are providing a platform to setup parties for newborns in a particular country. I went ahead and checked how many babies you had in that country, and the number has been pretty stable for the past decade, sitting on around 60,000 babies per year.

Remember, this company provide a specific service for newborns. And that service is only applicable for that country. And there are about 60,000 babies per year in that country. In this case, this is the time to do some math:

  • We’ll assume that all those births happen on a single month
  • We’ll assume that 100% of the babies will use this service
  • We’ll assume that we need to handle them within business hours only
  • 4 weeks x 5 business days x 8 business hours = 160 hours to handle 60,000 babies
  • 375 babies to handle per hour
  • Let’s assume that each baby requires 50 requests to handle
  • 18,750 requests / hour
  • 312 requests / minute
  • 5 requests / second

In other words, given the natural limit of their scaling (number of babies per year), and using very pessimistic accounting for the load distribution, we get to a number of requests to process that is utterly ridiculous.

It would be hard to not handle this properly on any server you care to name. In fact, you can get a machine under 150$ / month that has 8 cores. That gives you a core per requests per second, with 3 to spare.

Even if we have to deal with spikes of 50 requests / second. Any reasonable server ( the < 150% / month I mentioned) should be able to easily handle this.

About the only way for this system to get additional load is if there is a population explosion, at which point I assume that the developers will be busy handling nappies, not watching the CPU utilization.

For certain type of applications, there is a hard cap of what load you can be expected to handle. And you should absolutely take advantage of this. The more stuff you can not do, the better you are. And if you can make reasonable assumptions about your load, you don’t need to go crazy.

Simpler architecture means faster time to market, meaning that you can actually deliver value, rather than trying to prepare for the Babies’ Apocalypse.

time to read 2 min | 252 words

In my previous post, I asked you to find the bug in the following code:

This code looks okay, at a glance, but it turns out that this is a really nasty data corruption bug waiting to happen. Here is what the problematic usage looks like:

Do you see the error now?

If the operation will time out, an exception will be raised, but the underlying operation isn’t over. We are using a shared pool, so the buffer we use may be handed over to someone else. At this point, we do something with the buffer, but the pending I/O operation will read data into this buffer, meaning that this is probably going to be garbage in it when we actually use it.

To actually happen, you need to have a timeout operation, reuse of the buffer and the I/O operation completing at just the wrong time. So a sequence of highly unlikely events that would assuredly happen within an hour of pushing something like that to production. For fun, this will reliably happen the moment you have some network issues. So imagine that you have a slow node, which then cause memory corruption, which end up being a visible bug (instead of maybe aborted request) very rarely, and with no indication on how this happened.

How do you fix this? Like this:

This will use a cancellation token, which will cause the operation to be aborted at the stream level, meaning that we can safely reuse values that we passed the underlying stream.

time to read 4 min | 671 words

image

I got an email recently asking about my advice on how to approach the architecture on new projects. In particular, looking at typical architectural patterns, they are full of things like repositories, interfaces, components and multiple moving pieces. Usually they are marketed as items that will aid future extensibility or promote separation of concerns.

In fact, if you’ll look at the image on the right, you’ll see a typical timeline for a non trivial project. The amount of time that will be spent on the project architecture before we actually start writing any real code is obscene, in my eyes. This is the stage where we know the least about the project, and we are already spending so much time on it to get it moving. By the time we actually start getting real work done, the inertia and the amount of investment that we put into the infrastructure means that you usually can’t change it.

My approach for this is different. My goal at the beginning of the project is to get, as soon as possible, to the point where we have something that we can show to the user. That means that we don’t have time to do complex setups. Instead, I tend to follow the following architecture rule:

image

When I start a project, I write the minimal amount of infrastructure that I think that I can get away with. It is important to note that I’m writing this infrastructure with the explicit intent to throw it away. This is scaffolding, not permanent structure. Do you see the orange wheel in the picture?

That is meant to represent the abstraction layer between the infrastructure and the rest of the code. Once I have something that I can get away with, I can go ahead and solve real problems. If and when I’ll realize that I can’t really go on the way I did, I can change the infrastructure and not touch any of the application code.

For example, let’s say that I’m building a backend service. There are a lot of decisions that I need to make when building these:

  • What is the transport mechanism? REST? gRPC? SOAP?
  • How do you handle authentication?
  • How do you handle auditing?
  • How do you… ?

I can spend weeks and months on making these decisions. And they are important, but they aren’t useful at the beginning of the project. What is worse, they are likely to change. Six months down the road, we might realize that we need to meet the requirements of a particular regulation, so we can’t use a particular transport. If we are wedded to it, we are done.

So my infrastructure would be the minimal amount of code required to get something done. Here is an example of how it can work:

I’m now able to write code that inherit from AbstractMessageHandler and does stuff, using strongly typed code, without really caring how I go the messages or how I send them back. This is probably now what we’ll end up using, but it is small, easily handled / replaced and I can change it without touching the application code.

I have also ensured that I’m able to lean on the compiler and use the proper types, avoiding dealing with strings and implicitly typed values.

It took me about 15 minutes to create the above code, and my limit for setup time for a new project is one to three days. Afterward, we have to be able to write actual code, not infrastructure.

I’ll admit that it does take some thinking upfront. In the code above, I’ve decided to use a message handler pattern. In some cases, that wouldn’t be appropriate and I would need something else (for example, having an explicit channel to communicate between actors). Usually, however, these theme of the project is easier to decide on than the actual infrastructure.

time to read 4 min | 628 words

I posted about the @refresh feature in RavenDB, explaining why it is useful and how it can work. Now, I want to discuss a possible extension to this feature. It might be easier to show than to explain, so let’s take a look at the following document:

The idea is that in addition to the data inside the document, we also specify behaviors that will run at specified times. In this case, if the user is three days late in paying the rent, they’ll have a late fee tacked on. If enough time have passed, we’ll mark this payment as past due.

The basic idea is that in addition to just having a @refresh timer, you can also apply actions. And you may want to apply a set of actions, at different times. I think that the lease payment processing is a great example of the kind of use cases we envision for this feature. Note that when a payment is made, the code will need to clear the @refresh array, to avoid it being run on a completed payment.

The idea is that you can apply operations to the documents at a future time, automatically. This is a way to enhance your documents with behaviors and policies with ease. The idea is that you don’t need to setup your own code to execute this, you can simply let RavenDB handle it for you.

Some technical details:

  • RavenDB will take the time from the first item in the @refresh array. At the specified time, it will execute the script, passing it the document to be modified. The @refresh item we are executing will be removed from the array. And if there are additional items, the next one will be schedule for execution.
  • Only the first element in the @refresh array only. So if the items aren’t sorted by date, the first one will be executed and the persisted again. The next one (which was earlier than the first one) is already ready for execution, so will be run on the next tick.
  • Once all the items in the @refresh array has been processed, RavenDB will remove the @refresh metadata property.
  • Modifications to the document because of the execution of @refresh scripts are going to be handled as normal writes. It is just that they are executed by RavenDB directly. In other words, features such as optimistic concurrency, revisions and conflicts are all going to apply normally.
  • If any of the scripts cause an error to be raised, the following will happen:
    • RavenDB will not process any future scripts for this document.
    • The full error information will be saved into the document with the @error property on the failing script.
    • An alert will be raised for the operations team to investigate.
  • The scripts can do anything that a patch script can do. In other words, you can put(), load(), del() documents in here.
  • We’ll also provide a debugger experience for this in the Studio, naturally.
  • Amusingly enough, the script is able to modify the document, which obviously include the @refresh metadata property. I’m sure you can imagine some interesting possibilities for this.

We also considered another option (look at the Script property):

The idea is that instead of specifying the script to run inline, we can reference a property on a document. The advantage being is that we can apply changes globally much easily. We can fix a bug in the script once. The disadvantage here is that you may be modifying a script for new values, but not accounting for the old documents that may be referencing it. I’m still in two minds about whatever we should allow a script reference like this.

This is still an idea, but I would like to solicit your feedback on it, because I think that this can add quite a bit of power to RavenDB.

time to read 3 min | 532 words

Once you put a document inside RavenDB, this is pretty much it, as far as RavenDB is concerned. It will keep your data safe, allow to query it, etc. But it doesn’t generally act upon it. There are a few exceptions, however.

RavenDB supports the @expires metadata attribute. This attribute allows you to specify a specific time in which RavenDB will automatically delete the document. This is very useful for expiring documents. The classic example being a password reset token, which should be valid for a period of time and then removed.

Here is what this looks like:

image

And you can configure the frequency in which we’ll check for expired documents in the studio.

image

Expiring documents, however, isn’t all that RavenDB can do. RavenDB also has an additional feature, refreshing documents. You can mark a document to be refreshed by specifying the @refresh metadata attribute, like so:

image

It is easy to understand what @expires do. At a given time, it will delete the document, because it expired. But what does refresh do? Well, at the specified time, a document with the @refresh metadata attribute will be updated by RavenDB to remove the @refresh metadata attribute from the document.

Yep, that is all. In other words, the document above would turn into:

image

That is all. Surely this is the most useless feature ever. You set a property that will be removed at a future time, but the only thing that the property can say is when to remove itself. What kind of feature is this?

Well, this is a case where by itself, this would be a pretty useless feature. But the point of this feature is that this will cause the document to be updated. At that point, it is a normal update, which means that:

  • The document will be re-indexed.
  • The document will be sent over ETL.
  • The document will be sent to the relevant subscriptions.

The last point is the most important one. Here is an example of a typical subscription:

As you can see, this is a pretty trivial subscription, but it filters out commands that are set to refresh. What does this mean? It means that if the @refresh attribute is set, we’ll ignore the document. But since RavenDB will automatically clear the attribute when the refresh timer is hit, we gain a powerful ability.

We now have the ability to process delayed commands. In other words, you can save a document with a refresh and have it processed by a subscription at a given time.

Expanding on this, you can do the same using ETL. So you have a document that will be sent over to the ETL destination at a given time. You can also do the same for indexing as well.

And now this seemingly trivial / useless feature become a pivot for a whole new set of capabilities that you get with RavenDB.

time to read 3 min | 415 words

imageI run into this article that talks about building a cache service in Go to handle millions of entries. Go ahead and read the article, there is also an associated project on GitHub.

I don’t get it. Rather, I don’t get the need here.

The authors seem to want to have a way to store a lot of data (for a given value of lots) that is accessible over REST.  The need to be able to run 5,000 – 10,000 requests per second over this. And also be able to expire things.

I decided to take a look into what it would take to run this in RavenDB. It is pretty late here, so I was lazy. I run the following command against our live-test instance:

image

This say to create 1,024 connections and get the same document. On the right you can see the live-test machine stats while this was running. It peaked at about 80% CPU. I should note that the live-test instance is pretty much the cheapest one that we could get away with, and it is far from me.

Ping time from my laptop to the live-test is around 230 – 250 ms. Right around the numbers that wrk is reporting. I’m using 1,024 connections here to compensate for the distance. What happens when I’m running this locally, without the huge distance?

image

So I can do more than 22,000 requests per second (on a 2016 era laptop, mind) with max latency of 5.5 ms (which the original article called for average time). Granted, I’m simplifying things here, because I’m checking a single document and not including writes. But 5,000 – 10,000 requests per second are small numbers for RavenDB. Very easily achievable.

RavenDB even has the @expires feature, which allows you to specify a time a document will automatically be removed.

The nice thing about using RavenDB for this sort of feature is that millions of objects and gigabytes of data are not something that are of particular concern for us. Raise that by an orders of magnitude, and that is our standard benchmark. You’ll need to raise it by a few more orders of magnitudes before we start taking things seriously.

time to read 5 min | 822 words

This post asked an interesting question, why are hash table so prevalent for in memory usage and (relatively) rare in the case of databases. There is some good points in the post, as well as in the Hacker News thread.

Given that I just did a spike of persistent hash table and have been working on database engines for the past decade, I thought that I might throw my own two cents into the ring.

B+Tree is a profoundly simple concept. You can explain it in 30 minutes, and it make sense. There are some tricky bits to a proper implementation, for sure, but they are more related to performance than correctness.

Hash tables sounds simple, but the moment you have to handle collisions gracefully, you are going to run into real challenges. It is easy to get into nasty bugs with hash tables, the kind that silently corrupt your state without you realizing it.

For example, consider the following code:

This is a hash table using linear addressing. Collisions are handled by adding them to the next available node. And in this case, we have a problem. We want to put “ghi” in position zero, but we can’t, because it is already full. We move it to the first available location. That is well understood and easy. But when we delete “def”, we remove the entry from the array, but we forgot to do fixups for the relocated “ghi”, that value is now gone from the table, effectively. This is the kind of bug you need the moon to be in a certain position while a cat sneeze to figure out.

A B+Tree also maps very nicely to persistent model, but it is entirely non obvious how you can go from the notion of a hash table in memory to one on disk. Extendible hashing exists, and has for a very long time. Literally for more time than I’m alive, but it is not very well known / generically used. It is a beautiful algorithm, mind you. But just mapping the concept to a persistence model isn’t enough, typically, you also had a bunch of additional requirements from disk data structure. In particular, concurrency in database systems is frequently tied closely to the structure of the tree (page level locks).

There is also the cost issue. When talking about disk based data access, we are rarely interested in the actual O(N) complexity, we are far more interested in the number of disk seeks that are involved. Using extendible hashing, you’ll typically get 1 – 2 disk seeks. If the directory is in memory, you have only one, which is great. But with a B+Tree, you can easily make sure that the top levels of the tree will also be memory resident (similar to the extendible hash directory), that leads to typical 1 disk access to read the data, so in many cases, they are roughly the same performance for either option.

Related to the cost issue, you have to also consider security risks. There have been a number of attacks against hash tables that relied on generating hash collisions. The typical in memory fix is to randomize the hash to avoid this, but if you are persistent, you have to use the same hash function forever. That means that an attacker can very easily kill your database server, by generating bad keys.

But these are all relatively minor concerns. The key issue is that B+Tree is just so much more useful. A B+Tree can allow me to:

  • Store / retrieve my data by key
  • Perform range queries
  • Index using a single column
  • Index using multiple columns (and then search based on full / partial key)
  • Iterate over the data in specified order

Hashes allow me to:

  • Store / retrieve my data by key

And that is pretty much it. So B+Tree can do everything that Hashes can, but also so much more. They are typically as fast where it matters (disk reads) and more than sufficiently fast regardless.

Hashes are only good for that one particular scenario of doing lookup by exact key. That is actually a lot more limited than what you’ll consider.

Finally, and quite important, you have to consider the fact that B+Tree has certain access patterns that they excel at. For example, inserting sorted data into a B+Tree is going to be a joy. Scanning the B+Tree in order is also trivial and highly performant.

With hashes? There isn’t an optimal access pattern for inserting data into a hash. And while you can scan a hash at roughly the same cost as you would a B+Tree, you are going to get the data out of order. That means that it is a lot less useful than it would appear to upfront.

All of that said, hashes are still widely used in databases. But they tend to be used as specialty tools. Deployed carefully and for very specific tasks. This isn’t the first thing that you’ll reach to, you need to justify its use.

time to read 3 min | 542 words

imageOne of our developers recently got a new machine, and we were excited to see what kind of performance we can get out of it. It is an AMD Ryzen 9, 12 cores @ 3.79 Ghz with 32 GB of RAM. The disk used was Samsung SSD 970 EVO Plus 500 GB.

This isn’t an official benchmark, to be fair. This is us testing on how fast the machine is. As such, this is a plain vanilla Windows 10 machine, with no effort to perform any optimizations. Our typical benchmark involves loading all of stack overflow into RavenDB, so we’ll have enough data to work with. Here is what things looked like midway through:

image

As you can see, the write speed we are able to get is impressive.

We were able to insert all of stack overflow, a bit over 52GB in 3 and a half minutes, at a rate of about 300 MB / sec sustained.

Then we tested indexing.

  • Map/Reduce on users by registration month (source ~6 million users) – under a minute.
  • Full text search on users – two and a half minutes.
  • Simple index on questions by tag (over 18 million questions & answers) – 11.5 minutes.
  • Full text search on all questions and answers – 33 minutes.

Remember, these numbers are for indexing everything for the first time. It is worth noting that RavenDB dedicates a single thread per index, to avoid hammering the system with too much work. That means that this indexes were building concurrently with one another.

Here is the system utilization while this was going on:

image

Finally, we tested some other key scenarios (caching disabled in all of them):

  • Reading documents (small working set, representing recent questions)  - 243,371 req / ses at 512 MB / sec.
  • Full random reads (data size exceed memory, so disk hits) – 15,393.66 res / sec at 13.4 MB / sec.

These two are really interesting numbers. The first one, we generate queries to specific documents over an over (with no caching). That means that RavenDB is able to answer them from memory directly. The idea is to simulate a common scenario of a working set that can fit entirely in memory.

The second one is different. The data size on disk is 52 GB and we have 32 GB available for us. We generate random queries here, for different documents each time. We ensure that the queries cannot be served directly from memory and that RavenDB will have to hit the disk. As you can see, even under this scenario, we are doing fairly well. As an aside, it helps that the disk is good. We tried running this on HDD once. The results were… not nice.

The final test we did was for writes, writing a small document to RavenDB. We got 118,000 writes/sec on a sustained basis, with about 32MB / sec in data throughput. Note that we can do more, but playing with the system configuration, but we are already at high enough rate that it probably wouldn’t matter.

All in all, that is a pretty nice machine.

time to read 4 min | 698 words

A map/reduce index in RavenDB can be configured to output its value to a collection. This seems like a strange thing to want to do at first. We already got the results of the index, in the index. Why do we want to duplicate that by writing them to collections?

As it turns out, this is a pretty cool feature, because it enable us to do quite a lot. It means that we can apply anything that work on documents on the results of a map/reduce index. This list include:

  • Map/Reduce – so you can create recursive / chained map/reduce operations.
  • ETL – so you can push aggregated data to another location, allowing distributed aggregation at scale easily.
  • Subscription / Changes – so you can get notified when an aggregated value has been changed.

The key about the list above is that all of them don’t require you to know upfront the id of the generated documents. Indeed, RavenDB uses documents ids like the following for such documents:

image

Technically speaking, you can compute the id. RavenDB uses a predictable algorithm to generate such an id, but practically speaking, it can be hard to figure out exactly what the inputs are for the id generation. That means that certain document related features are not available. In particular, you can’t easily:

  • Include such a document
  • Load it directly (you have to query)

So we need a better option to deal with it. The way RavenDB solves this issue is by allowing you to specify a pattern for the output collection, like so:

image

As you can see, we have a map/reduce index that group by the company and year (marked in blue). We output the collection to YearlySummary, as shown in the previous image.

The pattern (marked in red) specify how we should name the output documents. Here is the result of this index:

image

And here is what this document looks like:

image

Huh?

This is strange, you probably think. This is the document we need to show the summary for companies/9-A in 1998, but there is no such data here. Instead, you’ll notice that the document collection is references (marked in red) and that it points to (marked in blue) the actual document with the data. Why do we do things this way?

A map/reduce document is free to output multiple results for the same reduce key, so we need to handle multiple documents here. We also have to deal with multiple reduce outputs that end up with the same pattern. For example, if we use map/reduce by day, but our pattern only specify the month, we’ll have multiple reduce keys that end up with the same pattern.

In practice, because RavenDB has great support for following documents by id, it doesn’t matter. Here is how I can use this index in a query:

This single query allow us to ask a question about companies (those that reside in London, in this case), as well as sales total data for a particular year. Note that this doesn’t do any joins or anything expensive. We have the information at hand, and can just use it.

You’ll notice that the pattern we specified is using both items that we reduce by. But that isn’t mandatory. We can also use this:

image

Here we only specify the company in the pattern. What would be the result?

image

Now we get the sales total for the company, on a per year basis.

We can now run the following query:

And this will give us the following output:

image

As you can imagine, this opens up quite a few possibilities for advanced features. In particular, it means that you can make it even easier for you to show and process aggregate information and work through complex object models.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. API Design (10):
    29 Jan 2026 - Don't try to guess
  2. Recording (20):
    05 Dec 2025 - Build AI that understands your business
  3. Webinar (8):
    16 Sep 2025 - Building AI Agents in RavenDB
  4. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  5. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}