Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:

oren@ravendb.net

+972 52-548-6969

Posts: 7,359 | Comments: 50,754

Privacy Policy Terms
filter by tags archive
time to read 4 min | 708 words

I read this blog post from Discord, which presents a really interesting approach to a problem that they run into. Basically, in the cloud you have the choice between fast and ephemeral disks and persistent (and much slower) disks.

Technically speaking you can try to get really fast persistent disks, but those are freaking expensive. Ultra disks on Azure and io2 disks on AWS and Extreme persistent disks for GCP. The cost for a single 1TB disk with 25K IOPS would be around 1,800 USD / month, for example. That is just for the disk!

To give some context, an r6gd.16xlarge instance with 64 cores and 512 GB (!) of RAM as well as 3.8 TB of NVMe drive will cost you less than that. That machine can also do 80K IOPS, for context.

At Discord scale, they run into the limitations of I/O in the cloud with their database and had to come up with a solution for that.

My approach would be to… not do anything, to be honest. They are using a database (ScyllaDB) that is meant to run on commodity hardware. That means that losing a node is an expected scenario. The rest of the nodes in the cluster will be able to pitch in and recover the data when the node comes back or is replaced. That looks like the prefect scenario for running with fast ephemeral disks, no?

There are two problems with ephemeral disks. First, they are ephemeral Smile, that means that a hardware problem can make the whole disk go poof. Not an issue, that is why you are using a replicated database, no? We’ll get to that.

Another issue that is raised by Discord is that they rely on disk snapshots for backups and other important workflows. The nice thing about snapshotting a cloud disk is that it is typically fast and cheap to do. Otherwise you need to deal with higher level backup systems at the database layer. The issue is that you probably do want that, because restoring a system from a set of disk snapshots that were taken at different times and at various states is likely to be… challenging.

Discord solution to this is to create a set of ephemeral disks that are mirrored into a persistent disks. Reads are going to the fast disks and writes are spread around. A failure on the ephemeral disks will lead to a recovery of the data from the network disk.

Their post has the gory details and a lot more explanation aside. I wanted to write this post for a simple reason. As a database guy, this system scares me to pieces.

The issue is that I don’t know what kind of data consistency we can expect in such an environment. Do I get any guarantees about the equivalence of data between the fast disks and the network disk? Can they diverge from one another? What happens when you have partial errors?

Consider a transaction that modify three different pages that end up going to three different fast disks + the network disk. If one of the fast disk has an error during write, what is written to the persistent disk?

Can I get an out of date read from this system if we read from the network disk for some reason? That may mean that two pages that were written in one transaction are coming back as different versions. That will likely violate assumptions and invariants and can lead to all sorts of… interesting problems.

Given that this system is meant to handle the failure modes, it sounds really scary because it is an additional layer of complexity (and one that the database is unaware of) to deal with.

Aside from the snapshots, I assume that the reason for this behavior is to avoid the cost of rebuilding a node when there is a failure. I don’t have enough information to say what the failure rate is and the impact on the overall system is, but the solution provided is elegant, beautiful and quite frankly, pretty scary to me.

There has been quite a unknowns that we had to deal with in the realm of storage. But this adds a whole new layer of things that can break.

time to read 4 min | 608 words

Around 2017 we needed to test RavenDB with realistic datasets. That was the time that we were working hard on the 4.0 release, and we wanted to have some common dataset that was production quality (for all the benefits and complications that this brings) to play with.

A serious issue was that we needed that dataset to also be public, because we wanted to discuss its details. The default dataset people usually talk about in such scenario is the Enron emails, but that is around half a million documents and quite small, all things considered.

Luckily for us, Stack Overflow has made their dataset publicly available in a machine readable format. That means that we could take that, adapt that to RavenDB and use that to test various aspects of our behaviors with realistic data.

The data is distributed as a set of XML files, so I quickly wrote something that would convert the data to a JSON format and adapt the model to a more relational one. The end result is a dataset with 18 million documents and with a hefty size of 52 GB. I remember that at the time, working with this data was a lengthy process. Importing the data took a long time and indexing even longer.

A few years later, this is still our go to dataset for anything involving non trivial amount of data, but we have gotten to the point where the full process of working with it has shrunk significantly. It used to take 45+ minutes to import the data, now it takes less than 10, for example. Basically, we made RavenDB good enough that it wasn’t that much of a challenge.

Of course… Stack Overflow continues to publish their dataset… so I decided it was time to update their data again. I no longer have the code that I used to do the initial import, but the entire process was fairly simple.

You can look at the code that is used to do the import here. This is meant to be quick & dirty code, mind you. It is about 500 lines of code and handles a surprisingly large number of edge cases.

You can find the actual data dump here.

And the explanation about the schema are here.

There is also a database diagram here.

In case you missed the point, the idea is that I want to remember how I did it in the future.

So far, I imported a bunch of Stack Exchange communities:

  • World Building – Just over 100K documents and 1 GB in size. Small enough to play with seamlessly.
  • Super User – 1.85 million documents and weighing at 4 GB in size. I think we’ll use that as the default database for showing things off on the Raspberry Pi edition.
  • Stack Overflow – 40.5 million documents and nearing 150 GB in size. This is a great scenario for working with significant amount of data. That is likely to be our new default benchmarking database.

The other advantage is that everyone is familiar with Stack Overflow. It makes for a great demo when we can pull up realistic data on the fly.

It already gave me some interesting details to explore. For example, enabling documents compression mode for the Super User community reduce the disk utilization to under 2 GB. That is a great space savings, and it means that we can easily fit the entire database on a small SD card and have a “RavenDB Server + Database in a box” as a Raspberry Pi.

The Stackoverflow dataset is 150GB without compression, with documents compression, it dropped to just 57GB, which is all kinds of amazing.

They make for great demos Smile.

time to read 6 min | 1183 words

For the past couple of years, we had a stealth project going on inside of RavenDB. That project is meant to re-architect the internals of how RavenDB handles queries. The goal is to have a major performance improvement for RavenDB indexing and queries.

We spent a lot of time thinking about architecting this. Design discussions for this feature goes back to 2015, to give you some context. The codename for this project is: Corax.

Recently we finished wiring the new engine into RavenDB and for the first time in a long while, we could actually do a comparative benchmark between the two implementations. For this post, I’m going to focus solely on indexing performance, by the way.

Here is a couple of (very simple) indexes working on indexing a 497 million documents.

image

You can see that the numbers are pretty good, but we just started. Here is what the numbers look like after about 7 million documents being indexed:

image

You can see that Corax already opened up quite a gap between the two engines.

As a reminder, we have been optimizing our indexing process with Lucene for literally over a decade. We have done a lot to make things fast. Corax is still beating Lucene quite handily.

However, let’s take a look here, so far we indexed ~16 million documents and we can see that we are slowing down a bit:

image

That actually makes sense, we are doing quite a lot of work around here. It is hard to maintain the same speed when you aren’t working on a blank slate.

However, Corax was architected for speed, so while we weren’t surprised by the overall performance, we wanted more. We started analyzing what is going on. Quite quickly we figured out a truly stupendous issue in Corax.

One of the biggest problems when competing with Lucene is that it is a great library. It has certain design tradeoffs that I don’t like, but the key issue is that you can’t just build your own solution. You need to match or exceed whatever Lucene is doing.

One of the design decisions that has a major impact on how Lucene operates is that it is using an LSM model (log structured merge). This means that it writes data to immutable files (segments) and merge them occasionally. That means that handling deletes in Lucene is naturally handled during those merges. It means that Lucene can get away with tracking a lot less data about the entries that it indexed. That reduces the overall disk space it requires.

Corax takes a different approach, we don’t do compaction, because that lead to occasional spikes in computes and I/O needs. Instead, Corax uses a steady progress model. That means that it needs to track more data than Lucene.

Our first Corax indexes took about 5 – 10 times more disk space than Lucene. That isn’t a percentage, that is five to ten times bigger. One of the ways we handle this is to use an adaptive compression algorithm. We look at the entries that are being indexed and compress them. We don’t do that blindly, we generate a dictionary to match the actual entries at hand and are able to achieve some spectacular compression rate. Corax still uses more disk space than Lucene, but now the difference is in percentages, rather than in multiples.

On a regular basis, we’ll also check if the type of data that is being indexed has changed and we need to re-compute the dictionary. It turns out that we did that using a random sampling of the entries in the index. The number of samples range from 1 in 10 to 1 in 100, depending on the size of the index.

Then we threw a half billion index entries at Corax, and merely checking whether the dictionary could be better would result in us computing a dictionary with over 5 million entries. That was easily fixed, thankfully. We need to limit the scan not just in proportion to the size of the index but also globally. We can rely on the random nature of the sampling to give us a better dictionary next time, if needed. And it won’t stall the indexing process.

After jumping over the most obvious hurdles, the next stage is to pull the profiler and see what kind of bottlenecks we have in the system. Here is the first thing that popped out to me:

image

Over 10% of the indexing time is spent on adding an item to CollectionOfBloomFilters, what is that?

Well, remember how I said that Lucene optimized its file structure to handle deletes better? One of the consequences of that is that deletes can be really expensive. If you are indexing a new document (which doesn’t need to delete), you can have a significant time saving by skipping that. This is the rule of the bloom filter here. Yes, even with that cost, for Lucene it is worth it.

For Corax… however, that isn’t the case. We can just skip that cost entirely.

10.75% performance boost

Next… we have this dude:

image

That call is meant to update the number of records that Corax is holding in the index. We are updating a persistent value once for each entry that is indexed. But we can do that once for the entire batch!

2.81% performance boost

Those are easy, no? What is next?

image

For each term that we run, we rent and return a buffer. For each term! That alone takes 1% of the indexing time. Utterly ridiculous. We can use a single buffer for the entire indexing operation, after all.

As for the IsAnalayzed property? That does some (trivial) computation, but we know that the value is immutable. Make that once in the constructor and turn that property into a field.

1.33% performance boost

Those are literally just the things we noticed in the first few minutes. After applying those changes, I reset the indexing and looked at the results after it ran for a while. And now that, I’ll admit, is far more gratifying.

image

It is really interesting to see the impact of seemingly minor changes like those. Especially because the architecture holds up quite well.

Corax is proceeding quite well and we have really great hopes for it. We need to hammer on it a bit more, but it is showing a lot of promise.

The really interesting thing is that all those changes (which ended up pretty much doubling the effective indexing speed) are all relatively minor and easily fixed. That is despite the fact that we wrote Corax to be optimized, you always find surprises when you run the profiler, and sometimes they are very pleasant ones.

time to read 2 min | 342 words

We run into an interesting scenario at work that I thought would make for a pretty good interview task. Consider a server that needs to proxy a request from the outside world to an internal service, something like this:

image

That isn’t that interesting. The interesting bit is that the network between the internal server and the proxy is running at 10Gb/sec and the external network is limited to 512Kb/sec.

Furthermore, the internal server expects the other side to just… take the load. It will blast the other side with as much data as it can, and if you can’t handle that, will cut the connection. What this means is that for small requests, the proxy can successfully pass the data to the external server, but for larger ones, it is unable to read the data quickly enough to do so and the internal server will disconnect from it.

It is the responsibility of the proxy to manage that scenario.  That is the background for this task, practically speaking, this means that you have the following code, which works if the size is 8MB but fails if it is 64MB.

We have the SlowClientStream and the FastServerStream – which means that we are able to focus completely on the task at hand (ignoring networks, etc).

The requirement is to pass a 64 MB of data between those two streams (which have conflicting requirements)

  • The FastServerStream requires that you’ll read from it in a rate of about 31Kb / sec.
  • The SlowClientStream, on the other hand, will accept data at a maximum rate of about 30Kb/sec (but is variable across time).

You may not change the implementation of either stream (but may add behavior in separate classes).

You may not read the entire response from the server before sending to the client.

There is a memory limit of 32 MB on the maximum amount of memory you may use in the program.

How would you go about solving this?

The challenge skeleton is here.

time to read 6 min | 1163 words

Today I had to look into the a customer whose RavenDB instance was burning through a lot of I/O. The process is somewhat ingrained in me by this point, but I thought that it would make for a good blog post so I’ll recall that next time.

Here is what this looks like from the point of view of the disk:

image

We are seeing a lot of reads in terms of MB/sec and a lot of write operations (but far less in terms of bandwidth). That is the external details, can we figure out more? Of course.

We start our investigation by running:

sudo iotop -ao

This command gives us the accumulative time for threads that do I/O. One of the important things that RavenDB is to tag its threads with the tasks that they are assigned. Here is a sample output:

  TID  PRIO  USER     DISK READ DISK WRITE>  SWAPIN      IO    COMMAND
 2012 be/4 ravendb    1748.00 K    143.81 M  0.00 %  0.96 % Raven.Server -c /ravendb/config/settings.json [Follower thread]
 9533 be/4 ravendb     189.92 M     86.07 M  0.00 %  0.60 % Raven.Server -c /ravendb/config/settings.json [Indexing of Use]
 1905 be/4 ravendb     162.73 M     72.23 M  0.00 %  0.39 % Raven.Server -c /ravendb/config/settings.json [Indexing of Use]
 1986 be/4 ravendb     154.52 M     71.71 M  0.00 %  0.38 % Raven.Server -c /ravendb/config/settings.json [Indexing of Use]
 9687 be/4 ravendb     185.57 M     70.34 M  0.00 %  0.59 % Raven.Server -c /ravendb/config/settings.json [Indexing of Car]
 1827 be/4 ravendb     172.60 M     65.25 M  0.00 %  0.69 % Raven.Server -c /ravendb/config/settings.json ['Southsand']

In this case, we see the top 6 threads in terms of I/O (for writes). We can see that we have a lot of of indexing and documents writes. That said, thread names in Linux are limited to 14 characters, so we probably need to give better names to them.

That is part of the task, let’s look at the cost in terms of reads:

  TID  PRIO  USER    DISK READ>  DISK WRITE  SWAPIN      IO    COMMAND
11191 be/4 ravendb       2.09 G     31.75 M  0.00 %  7.58 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool]
11494 be/4 ravendb    1353.39 M     14.54 M  0.00 % 19.62 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool]
11496 be/4 ravendb    1265.96 M      4.97 M  0.00 % 16.56 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool]
11211 be/4 ravendb    1120.19 M     42.66 M  0.00 %  2.83 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool]
11371 be/4 ravendb    1114.50 M     35.25 M  0.00 %  5.00 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool]
11001 be/4 ravendb    1102.55 M     43.35 M  0.00 %  3.12 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool]
11340 be/4 ravendb     825.43 M     26.77 M  0.00 %  4.85 % Raven.Server -c /ravendb/config/settings.json [.NET ThreadPool]

That is a lot more complicated, however. Now we don’t know what task this is running, only that something is reading a lot of data.

We have the thread id, so we can make use of that to see what it is doing:

sudo strace -p 11191 -c

This command will track the statistics on the systems calls that are issued by the specified thread. I’ll typically let it run for 10 – 30 seconds and then hit Ctrl+C, giving me:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 90.90    3.868694         681      5681        82 futex
  8.28    0.352247           9     41035           sched_yield
  0.79    0.033589        1292        26           pwrite64
  0.03    0.001246          52        24         1 recvfrom
  0.01    0.000285         285         1           restart_syscall
  0.00    0.000000           0         2           madvise
  0.00    0.000000           0         2           sendmsg
------ ----------- ----------- --------- --------- ----------------
100.00    4.256061                 46771        83 total

I’m mostly interested in the pwrite64 system call. RavenDB uses mmap() for most of its data access, so that is harder to read, but we can get a lot of information from the output. Now I’m going to run the following command:

sudo strace -p 11191 -e trace=pwrite64

This will give us a trace of all the pwrite64() system calls from that thread, looking like this:

pwrite64(315, "\365\275"..., 4113, 51080761896) = 4113
pwrite64(315, "\344\371"..., 4113, 51080893512) = 4113

There is an strace option (-y) that can be used to show the file paths for system calls, but I forgot to use it, no worries, I can do:

sudo ls -lh /proc/11191/fd/315

Which will give me the details on this file:

lrwx------ 1 root root 64 Aug  7 09:21 /proc/11783/fd/315 -> /ravendb/data/Databases/Southsand/PeriodicBackupTemp/2022-08-07-03-30-00.ravendb-encrypted-full-backup.in-progress

And that tells me everything that I need to know. The reason we have high I/O is that we are generating a backup file. That explains why we are seeing a lot of reads (since we need to read in order to generate the backup).

The entire process is mostly about figuring out exactly what is going on, and RavenDB is very careful about leaving as many breadcrumbs as possible to make it easy to follow.

time to read 4 min | 629 words

A customer was experiencing large memory spikes in some cases, and we were looking into the allocation patterns of some of the queries that were involved. One of the things that popped up was a query that allocated just under 30GB of managed memory during its processing.

Let me repeat that, because I bears repeating. That query allocated 30(!) GB(!) during its execution. Now, that doesn’t mean that it was consuming 30 GB, it was just the allocations involved. Most of that memory was immediately discarded during the operation. But 30 GB of garbage to cleanup puts a lot of pressure on the system. We took a closer look at the offensive query. It looked something like this:

from index “Notifications/RoutingAndPriority”
where startsWith(Route, $routeKeyPrefix)
order by
Priority desc

That does not seam like a query that should be all that expensive. But details matter, so we dove into this. For this particular query, the routes are hierarchical structure that are unique for each message. Something like:

  • notifications/traffic/new-york-city/67a81019-941b-4d04-a0db-0559ed45343c
  • notifications/emergency/las-vegas/0a8e18fb-563b-4b6a-8e93-e10e08239656

And the queries that were generated were on using the city & topic to filter the information that they were interested in.

The customer in question had a lot of notifications going on at all times. And each one of their Route was unique. Internally, RavenDB uses Lucene (currently Smile ) to handle searches, and Lucene is using an inverse index to execute queries.

The usual way to think about is like this:

image

We have a list of terms (Brown, Green & Purple) and each of them has  list of the match documents that contain the particular term.

The process of issuing a prefix query then is easy, scan all entries that match the prefix and return their results. This is indeed what Lucene is doing. However… while it is doing that, it will do something like this:

Pay close attention to what is actually happening here. There are two enumerators that we work with. One for the terms for the field and one for the documents for a specific term.

All of this is perfectly reasonable, but thee is an issue. What happens when you have a lot of unique values? Well, then Lucene will have a lot of iterations of the loop. In this case, each term has just a single match, and Lucene is pretty good in optimizing search by specific term.

The actual problem is that Lucene allocates a string instance for each term. If we have 30 million notifications for New York’s traffic, that means that we’ll allocate 30 million strings during the processing of the query. We aren’t retaining these strings mind. They’ll be cleaned up by the GC quickly enough, but that is an additional costs that we don’t actually want.

Luckily, in this case, there is a much simple solution. Given that the pattern of route is known, we can skip the unique portion of the route. That means that in our index, we’ll do something similar to:

Route = doc.Route.Substring(0, doc.Route.LastIndexOf('/') + 1)

Once that is done, the number of unique matches there would be negligible. There would be no more allocations galore to observe and overall system performance is much improved.

We looked into whatever there is something that we can do with Lucene to avoid this allocations issue, but it is endemic to the way the API works. The longer term plan is to fix that completely, of course. We are making great strides there already Smile.

In short, if you are doing startsWith() queries or similar, pay attention to the number of unique terms that you have to go through. A simple optimization on the index like the one above can bring quite a bit of dividends.

time to read 1 min | 93 words

In RavenDB 5.4, we’re introducing new ELT features for Kafka and RabbitMQ. Now, instead of your documents just sitting there in your database, you can involve them in your messaging transactions. In this webinar, RavenDB CEO Oren Eini explains how these ETL tasks open up a whole new world of architectural patterns, and how they spare you from a lot of complexity when you want to involve your data in pub/sub or other messaging patterns.

time to read 1 min | 97 words

I spoke at Cloud Lunch & Learn about the basics of building a database from scratch. We took a storage engine and created a simple database within the span of an hour.

Covered in the talk are the details of how you can build the database, using indexes to speed up queries and the manner in which a database interact with its storage engine. I think it was a great talk, but let me know about your feedback:

time to read 4 min | 653 words

RavenDB is written in C#, and as such, uses managed memory. As a database, however, we need granular control of our memory, so we also do manual memory management.

One of the key optimizations that we utilize to reduce the amount of overhead we have on managing our memory is using an arena allocator. That is a piece of memory that we allocate in one shot from the operating system and operate on. Once a particular task is done, we can discard that whole segment in one shot, rather than try to work out exactly what is going on there. That gives us a proper scope for operations, which means that missing a free in some cases isn’t the end of the world.

It also makes the code for RavenDB memory allocation super simple. Here is what this looks like:

image

Whenever we need to allocate more memory, we’ll just bump the allocator up. Initially, we didn’t even implement freeing memory, but it turns out that there are a lot of long running processes inside of RavenDB, so we needed to reuse the memory inside the same operation, not just between operations.

The implementation of freeing memory is pretty simple, as well. If we return the last item that we allocated, we can just drop the next allocation position by how many bytes were allocated. For that matter, it also allows us to do incremental allocations. We can ask for some memory, then increase the allocation amount on the fly very easily.

Here is a (highly simplified) example of how this works:

As you can see, there isn’t much there. A key requirement here is that you need to return the memory back in the reverse order of how you allocated it. That is usually how it goes, but what if it doesn’t happen?

Well, then we can’t reuse the memory directly. Instead, we’ll place them in a free list. The actual allocations are done on powers of two, so that makes things easier. Here is what this actually looks like:

image

So if we free, but not from the top, we remember the location and can use it again. Note that for 2048 in the image above, we don’t have any free items.

I’m quite fond of this approach, since this is simple, easy to understand and has a great performance profile.  But I wouldn’t be writing this blog post if we didn’t run into issues, now would I?

A customer reported high memory usage (to the point of memory exhaustion) when doing a certain set of operations. That… didn’t make any sense, to be honest. That was a well traveled code path, any issue there should have been long found out.

They were able to send us a reproduction and the support team was able to figure out what is going on. The problem was that the code in question did a couple of things, which altogether led to an interesting issue.

  • It allocated and deallocated memory, but not always in the same order – this is fine, that is why we have the free list, after all.
  • It extended the memory allocation it used on the fly – perfectly fine and an important optimization for us.

Give it a moment to consider how could these two operations together result in a problem…

Here is the sequence of events:

  • Loop:
    • Allocate(1024) -> $1
    • Allocate(256) -> $2
    • Grow($1, 4096) -> Success
    • Allocate(128) -> $3
    • Free($1) (4096)
    • Free($3) (128)
    • Free($2) (256)

What is going on here?

Well, the issue is that we are allocating a 1KB buffer, but return a 4KB buffer. That means that we add the returned buffer to the 4KB free list, but we cannot pull from that free list on allocation.

Once found, it was an easy thing to do (detect this state and handle it), but until we figured it out, it was quite a mystery.

time to read 3 min | 488 words

I’m trying to compare indexing speed of Corax vs. Lucene. Here is an interesting result:

image

We have two copies of the same index, running in parallel on the same data. And we can clearly see that Lucene is faster. Not by a lot, but enough to warrant investigation.

Here is the core of the work for Lucene:

image

And here it is for Corax:

image

If you look at the results, you’ll see something really interesting.

For the Corax version, the MapItems.Execute() is almost 5% slower than the Lucene version.

And that really pisses me off. That is just flat out unreasonable to see.

And the reason for that is that the MapItems.Execute() is identical in both cases. The exact same code, and there isn’t any Corax or Lucene code there. But it is slower.

Let’s dig deeper, and we can see this interesting result. This is the Lucene version, and the highlighted portion is where we are reading documents for the indexing function to run:

image

And here is the Corax version:

image

And here it is two thirds more costly? Are you kidding me? That is the same freaking code and is utterly unrelated to the indexing.

Let’s dig deeper, shall we? Here is the costs breakdown for Lucene, I highlighted the important bits:

image

And here is the cost breakdown for Corax

image

I have to explain a bit about what is going on here. RavenDB doesn’t trust the disk and will validate the data it reads from it the first time it loads a page.

That is what the UnlikelyValidatePage is doing.

What we are seeing in the profiler results is that both Corax and Lucene are calling GetPageInternal() a total of 3.69 million times, but Corax is actually paying the cost of page validation for the vast majority of them.

Corax validated over 3 million pages while Lucene validated only 650 thousand pages. The question is why?

And the answer is that Corax is faster than Lucene, so it is able to race ahead. When it races ahead, it will encounter pages first, and validate them. When Lucene comes around and tries to index those documents, they were already validated.

Basically, Lucene is surfing all the way forward on the wavefront of Corax’s work, and ends up doing a lot less work as a result.

What this means, however, is that we need to test both scenarios separately, on cold boot. Because otherwise they will mess with each other results.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. re (32):
    16 Aug 2022 - How Discord supercharges network disks for extreme low latency
  2. Production postmortem (43):
    05 Aug 2022 - The allocating query
  3. Webinar recording (14):
    26 Jul 2022 - RavenDB & Messaging Transactions
  4. Recording (5):
    25 Jul 2022 - Build your own database at Cloud Lunch & Learn
  5. High performance .NET (7):
    19 Jul 2022 - Building a Redis Clone–Analysis II
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats