Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 1 min | 135 words

I posted about our RavenDB C++ client a while ago, but I was really bad about making sure that we have regular updates. We have actually finished it already, there are even articles about it available now. The article was written by Michael Yarichuk and covers getting started and some of the basic steps to get running in C++ with RavenDB.

We had a very simple challenge in building our C++ client. We want to give you the same level of comfort and features set in C++ as you would get in a managed language. While keeping the same level of performance you’ll expect from a C++ application. I think we have done so quite successfully. You can read the article for the full details. No gore included Smile.

time to read 1 min | 164 words

I’ll be talking about Building a Grownup Database in the Big Data and Cloud Meetup in Santa Clara, March 18.

I’m going to show some of the features (and the thinking behind them) that went into making RavenDB a simpler database to develop against and operate.

Abstract:

A database is a complex, often fussy beast. For years, Oren Eini has made his living by fixing performance issues of various kinds. After seeing the same mistakes happen again and again, Oren decided to build his own database where these problems will never arise. RavenDB (https://ravendb.net/) started as a solution to the universal problems with relational models, and has been deployed in production for over a decade.
Oren Eini will talk about the kind of features that make RavenDB a grown up database:
-- It doesn't need a full-time babysitter
-- Uses AI automatic indexing and self optimizing engines
-- Understands the operational environment and adjusts to it without the need for a human in the loop
-- High Availability
-- Secured development

time to read 3 min | 434 words

RavenDB makes extensive use of certificates for authentication and encryption. They allow us to safely communicate between distributed instances without worrying about a man in the middle or eavesdroppers. Given the choices we had to implement authentication, I’m really happy with the results of choosing certificates as the foundation of our authentication infrastructure.

It would be too good, however, to expect to have no issues with certificates. The topic of this point is a puzzler. A user has chosen to use a self signed certificate for the nodes in the cluster, but was unable to authenticate between the servers unless they registered the certificate in the OS’ store.

That sounds reasonable, right? If this is a self signed certificate, we obviously don’t trust it, so we need this extra step to ensure that we do trust it. However, we designed RavenDB specifically to avoid this step. If you are using a self signed certificate, the server will trust its own certificate, and thus will trust anyone that is using the same certificate.

In this case, however, that wasn’t happening. For some reason, the code path that we use to ensure that we trust our own certificate was not being activated, and that was a puzzler indeed.

One of the things that RavenDB does on first startup is to try to connect to itself as a client. It checks whatever it is successful or not. If not, we’ll try again, ignoring the registered root CAs. If we are successful at that point, we know what the issue here and ensure that we ignore the untrusted signer on the certificate. We only enable this code path if by default we don’t trust our own certificate.

Looking at the logs, we could see that we got a failure when talking to ourselves, some sort of a device not ready issue. That was strange. We hooked strace to look into what was going on, but there was nothing that was wrong at the sys call level. Then we looked into what was going on and realized that the issue was that the server’s was configured to use: https://ravendb-1.francecentral.cloudapp.azure.com/ but was actually hosted on https://ravendb-1-tst.francecentral.cloudapp.azure.com/

Do you see the difference?

The server was try to contact itself using the configured hostname. It failed, because of a DNS issue, so it couldn’t contact itself to figure out that the certificate was invalid. At that point, it didn’t install the hook and wouldn’t trust the self signed certificate.

So the issue started with investigating why we nodes in the cluster don’t trust each other with self signed certificate and got resolved by a simple configuration error.

time to read 2 min | 281 words

Subscriptions in RavenDB gives you a great way to handle backend business processing. You can register a query and get notified whenever a document that matches your query is changed. This works if the document actually exists, but what happens if you want to handle a business process relating to document’s deletion ?

I want to explicitly call out that I’m generally against deletion. There are very few business cases for it. But sometimes you got to (GDPR comes to mind) or you have an actual business reason for this.

A key property of deletion is that the data is gone, so how can you process deletions? A subscription will let you know when a document changes, but not when it is gone. Luckily, there is a nice way to handle this. First, you need to enable revisions on the collection in question, like so:

image

At this point, RavenDB will create revisions for all changed documents, and a revision is created for deletions as well. You can see deleted documents in the Revisions Bin in the Studio, to track deleted documents.

image

But how does this work with Subscriptions? If you’ll try to run a subscription query at this point, you’ll not find this employee. For that, you have to use versioned subscription, like so:

image

And now you can subscribe to get notified whenever an employee is deleted.

time to read 2 min | 388 words

I recently had what amounted to a drive by code review. I was looking into code that wasn’t committed or PR. Code that might not have been even saved to disk at the time that I saw it. I saw that while working with the developer on something completely different. And yet even a glace was enough to cause me to pause and make sure that this code will be significantly changed before it ever move forward. The code in question is here:

What is bad about this code? No, it isn’t the missing ConfigureAwait(false), in that scenario we don’t need it. The problem is in the very first line of code.

This is meant to be public API. It will have consumers from outside our team. That means that the very first thing that we need to ensure is that we don’t expose our own domain model to the outside world.

There are multiple reasons for this. To start with, versioning is a concern. Sure, we have the /v1/  in the route, but there is nothing here that would make break if we changed our domain model in a way that a third party client relies on. We have a compiler, we really want to be able to use it.

The second issue, which I consider more important, is that this leaks information that I may not really want to share. By exposing my full domain model to the outside world, I risk quite a bit. For example, I may have internal notes on the support ticket which I don’t want to expose to the public. Any field that I expose to the outside world is a compatibility concern, but any field that I add is a problem as well. This is especially true if I assume that those fields are private.

The fix is something like this:

Note that I have class that explicitly define the shape that I’m giving to the outside world. I also manually map between the internal and external fields. Doing something like auto mapper is not something that I want, because I want all of those decisions to be made explicitly. In particular, I want to be sure that every single field that I share with the outside world is done in such a way that it is visible during PR reviews.

time to read 2 min | 260 words

These are not the droids you are looking for! – Obi-Wan Kenobi

Sometimes you need to find a set of documents not because of their own properties, but based on a related document. A good example may be needing to find all employees that blue Nissan car. Here is the actual model:

image

In SQL, we’ll want a query that goes like this:

This is something that you cannot express directly in RavenDB or RQL. Luckily, you aren’t going to be stuck, RavenDB has a couple of options for this. The first, and the most closely related to the SQL option is to use a graph query. That is how you will typically query over relationships in RavenDB. Here is what this looks like:

Of course, if you have a lot of matches here, you will probably want to do things in a more efficient manner. RavenDB allows you to do so using indexes. Here is what the index looks like:

The advantage here is that you can now query on the index in a very simple manner:

RavenDB will ensure that you get the right results, and changing the Car’s color will automatically update the index’s value.

The choice between these two comes down to frequency of change and how large the work is expected to be. The index favors more upfront work for faster query times while the graph query option is more flexible but requires RavenDB to do more on each query.

time to read 2 min | 254 words

We run a lot of benchmarks internally and sometimes it feels like there is a roaming band of performance focused optimizers that go through the office and try to find under utilized machines. Some people mine bitcoin for fun, in our office, we benchmark RavenDB and try to see if we can either break a record or break RavenDB.

Recently a new machine was… repurposed to serve as a benchmarking server. You can call it a right of passage for most new machines here, I would say. The problem with that machine is that the client would error. Not only would it fail, but at the exact same interval. We tested that from multiple clients and from multiple machines and found that every 30 minutes on the dot, we’ll have an outage that lasted under one second.

Today I come to the office to news that the problem was found:

image

It seems that after 30 minutes of idle time (no user logged in), the machine would turn off the ethernet, regardless of if there are active connections going on. Shortly afterward it would be woken up, of course, but it would be down just enough time for us to notice it.

In fact, I’m really happy that we got an error. I would hate to try to figure out latency spikes because of something like this, and I still don’t know how the team found the root cause.

time to read 3 min | 568 words

Compression is a nice way to trade off time for space. Sometimes, this is something desirable, especially as you get to the higher tiers of data storage. If your data is mostly archived, you can get significant savings in storage in trade for a bit more CPU. This perfectly reasonable desire create somewhat of a problem for RavenDB, we have competing needs here. On the one hand, you want to compress a lot of documents together, to benefit for duplications between documents. On the other hand, we absolutely must be able to load a single document as fast as possible. That means that just taking 100MB of documents and compressing them in a naïve manner is not going  to work, even if this is going to result in great compression ratio. I have been looking at zstd recently to help solve this issue. 

The key feature for zstd is the ability to train the model on some of the data, and then reuse the resulting dictionary to greatly increase the compression ratio.

Here is the overall idea. Given a set of documents (10MB or so) that we want to compress, we’ll train zstd on the first 16 documents and then reuse the dictionary to compress each of the documents individually. I have used a set of 52MB of JSON documents as the test data. They represent restaurants critics, I think, but I intentionally don’t really care about the data.

Raw data: 52.3 MB. Compressing it all with 7z gives us 1.08 MB. But that means that there is no way to access a single document without decompressing the whole thing.

Using zstd with the compression level of 3, I was able to compress the data to 1.8MB in 118 milliseconds. Choosing compression level 100 reduced the size to 1.02MB but took over 6 seconds to run.

Using zstd on each document independently, where each document is under 1.5 KB in size gave me a total reducing from to 6.8 MB. This is without the dictionary. And the compression took 97 milliseconds.

With a dictionary whose size was set to 64 KB, computed from the first 128 documents gave me a total size of 4.9 MB and took 115 milliseconds.

I should note that the runtime of the compression is variable enough that I’m pretty much going to call all of them the same.

I decided to use this on a different dataset and run this over the current senators dataset. Total data size is 563KB and compressing it as a single unit would give us 54kb. Compressing as individual values, on the other hand, gave us a 324 kb.

When training zstd on the first 16 documents with 4 KB of dictionary to generate we got things down to 105 kb.

I still need to mull over the results, but I find them really quite interesting. Using a dictionary will complicate things, because the time to build the dictionary is non trivial. It can take twice as long to build the dictionary as it would be to compress the data. For example, 16 documents with 4 kb dictionary take 233 milliseconds to build, but only take 138 milliseconds to compress 52 MB. It is also possible for the dictionary to make the compression rate worse, so that is fun.

Any other idea on how we can get both the space savings and the random access option would be greatly appreciated.

time to read 2 min | 311 words

RavenDB always had optimistic concurrency, I consider this to be an important feature for building correct distributed and concurrent systems. However, RavenDB doesn’t implement pessimistic locking. At least, not explicitly. It turns out that we have all the components in place to support it. If you want to read more about what pessimistic locking actually is, this Stack Overflow answer has good coverage of the topic.

There are two types of pessimistic locking. Offline and online locking. In the online mode, the database server will take an actual lock when modifying a record. That model works for a conversation pattern with the database. Where you open a transaction and hold it open while you mutate the data. In today’s world, where most processing is handled using request / response  (REST, RPC, etc), that kind of interaction is rare. Instead, you’ll typically want to use offline pessimistic lock. That is, a lock that can live longer than a single transaction. With RavenDB, we build this feature on top of the usual optimistic concurrency as well as the document expiration feature.

Let’s take the classic example of pessimistic locking. Reserving seats for a show. Once you have selected a seat, you have 15 minutes to complete the order, otherwise the seats will automatically be released. Here is the code to do this:

The key here is that we rely on the @expires feature to remove the seatLock document automatically. We use a well known document id to coordinate concurrent requests that try to get the same seat. The rest is just the usual RavenDB’s optimistic concurrency behavior.

You have 15 minutes before the expiration and then it goes poof. From the point of view of implementing this feature, you’ll spend most of your time writing the edge cases, because from the point of view of RavenDB, there is really not much here at all.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}