Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:

oren@ravendb.net

+972 52-548-6969

Posts: 7,252 | Comments: 50,433

Privacy Policy Terms
filter by tags archive
time to read 2 min | 306 words

RavenDB is a database, not a queue or a service bus. That said, you can make use of RavenDB subscriptions to get a very similar behavior to a service bus. Let’s see how much effort it will take us to implement backend processing using RavenDB only.

We assume that we have commands or messages, that are written to the Commands collection and are handled via a subscription (which may have multiple concurrent workers). In terms of your messaging models, we have:

The CommandBase we have here defines the following infrastructure properties:

  • Status – enum [Initial, Processing, Failed, Completed] – default value is Initial
  • RetriesCount – int – default value is 3
  • Error – string – null by default

We can now define our subscription using the following query:

from Commands as c
where c.RetriesCount > 0  and c.Status != 'Completed' and c.’@metadata’.’@refresh’ == null

This query is pretty simple, but it allows me to get all the documents that haven’t exceeded their retry count. The @refresh option allows me to register a command to be executed at a later point in time. See the documentation here, this is a feature that exists specifically to allow you to schedule commands with subscriptions.

In my subscription workers, I can now execute:

The code above is sufficient to get most of the way toward a robust message handling system.

I can easily see what messages are being processing, I can see how long they take, etc. I can see what failed and why. And I can see the history of commands.

That handles scenarios such as error handling and retries, introspection on the state of the system and you can derive from here all the relevant numbers on throughput, capacity, etc.

It isn’t a complete solution, but for very little code, you can take this quite a long way.

time to read 2 min | 313 words

Enabling RavenDB’s revisions allows you to ask RavenDB to keep immutable copies of a document. We originally envisioned this feature as a way to have easy audit trails and a time travel feature. Revisions were meant to be something that you’ll typically access as the administrator, not something that we expected to be used in normal course of events.

Usage in the field showed that users often want to make revisions a core part of the domain model. We have a user that uses revisions to mark the Approved (and thus, locked) version of a Plan document, for example. Another example is a payroll processing system where the Contract for a particular employee isn’t pointing to a document, but to a specific revision of that contract. Modifications to the contract have no impact on the employee unless they are explicitly moved to a new version of the contract.

Seeing all those use cases popping up for revisions, we added more features around revisions to make it easier to work with them (such as allowing to explicitly create a revision on command or enforcing revisions policies after the fact).

In RavenDB 5.3 we have added the ability to include revisions as part of the query, so you get the same benefit of reduction in the number of remote calls as you get when working with document references. Let’s say that I want to calculate payroll for a set of employees, here is how I can do that (the Contract property contains the change vector of the specific version of the contract):

image

In terms of API, this is how you use it:

This smooths out a scenario where you had to deal with making multiple remote calls into a single one for the whole process. Just the kind of improvement that I like to see.

time to read 3 min | 499 words

imageRavenDB is an OLTP database, it is meant to be the backend of business applications. There are some features in RavenDB that are meant for reporting purposes, but that is quite explicitly not our main focus. That is part of why RavenDB has such good integration with the rest of the environment, to give you the ability to use the best tool for the job.

With RavenDB 5.3, we are now allowing you to integrate directly with Power BI, so you can pull data from RavenDB to Power BI, write reports and in general utilize the full power of Power BI with your RavenDB data.

The image on the right, for example, is a report generated inside of Power BI on top of the sample data from RavenDB.

As you can imagine, I’m particularly stoked about this feature. Not only does it make reporting integration with RavenDB a lot simpler, the way that we do it is quite interesting. Instead of diving to the technical details, it would probably be more fun to show you how it works, from the perspective of Power BI.

The first thing we need to do is connect Power BI to RavenDB, using your existing Power BI system, you can simple click on Get Data and select:

image

You’ll then need to provide the connection details:

image

You’ll then be presented with the following dialog:

image

As you can see, we are translating the JSON documents inside of RavenDB to a columnar format for ease of processing inside of Power BI.

You can even take this further and issue RQL queries directly from inside of Power BI and transform the data. That means that you can utilize indexes, map/reduce operations, etc. Take a look:

image

And the result inside of Power BI:

image

As you can imagine, this is going to be a powerful tool in how you can work with your RavenDB data. You can also take this further and integrate that with Power BI on Azure, of course.

The way this works behind the scene is that we can now understand PostgreSQL wire protocol. That means that we can now be accessed from anywhere that can connect to Postgres. While the PostgreSQL protocol implementation is still marked as experimental, we have put the Power BI integration through its paces and we consider that stable enough for regular use.

Happy reporting Smile.

This feature is part of the RavenDB 5.3 release (expected in mid November) and is available in the Enterprise edition of RavenDB.

time to read 4 min | 649 words

imageA really nice feature that we have in RavenDB 5.3 is support for wire protocol compatibility with PostgreSQL. That opens up RavenDB to the entire PostgreSQL ecosystem. You are now able to connect to a RavenDB instance using the tools such as psql, Npgsql, etc. This feature is both surprisingly simple and incredibly complex at the same time.

The actual wire protocol from Postgres is well documented and pretty clean. Doing a clean room implementation of that is a straightforward process. Adding that to RavenDB, on the other hand, led to a number of interesting challenges. To start with, the protocol assumes, at the wire level, that all the rows that you return for a query have the same structure. This is a very reasonable assumption to make for a relation database protocol, but it doesn’t hold true when you are talking about a schema-less database such as RavenDB.

Then there is the fact that clients will generate all sort of pretty scary queries before you even get to running a user’s query. For example, take a look at how Npgsql is detecting the capabilities of the database that it connects to. Just supporting the wire protocol isn’t sufficient, you also need to support quite a bit of additional behavior.

When we implemented this feature, we decided that we’ll support the wire protocol, so you’ll be able to connect, issue queries and get results. However, the query language itself is going to be RQL. We aren’t attempting to pretend that we are a PostgreSQL instance to the outside world, only implement enough to make integration and compatibility work.

Here is an example of running a query through the Postgres integration.

image

This is an experimental feature, mind you. It is showing a lot of promise, but we want to get some more feedback from our users about which ways we should take it. The feature opens up many doors, but it is also bringing with it a non trivial amount of complexity.

This feature requires that we’ll open up another port to the world, this is something that we require the user to explicitly allow. To enable this feature, you’ll need to set the following options in the settings.json configuration file:

"Integrations.PostgreSQL.Enabled": true
"Features.Availability" : "Experimental"

You can also control which port it will use using the Integrations.PostgreSQL.Port configuration option. We default to 5433 if none is specified.

At the current time, we only allow to issue queries and not modify data using the Postgres integration. This is something that we would very much like more feedback on, what kind of scenarios would you like to have where write scenario is supported? What kind of writes do you expect to have at that point?

Finally, a word about security. The PostgreSQL protocol supports using TLS for encryption. When running in insecure mode, RavenDB will reject SSL/TLS connections from Postgres client. When running in a secured mode (the default), the same server certificate that is used inside of RavenDB will also be used for the Postgres connection. However, while we usually require that the other side also authenticates using a client certificate, in the case of Postgres connection, we run into a problem. There are quite a few scenarios where we found out that while the Postgres protocol supports mutual authentication using client certificates, clients aren’t supporting it.

For that reason, we are allowing user & password authentication (on top of TLS connection, obviously) for the Postgres connections at this time. Note that there is no correlation between the Postgres login and the access to any other RavenDB features (where client certificates is the only option).

This is part of the RavenDB 5.3 release (expected in mid November) and will be available in the Professional and Enterprise editions.

time to read 3 min | 564 words

imageRavenDB tries to be a good neighbor in your systems. RavenDB is typically used in polyglot solutions and we are often brought in to existing ecosystems. One of the things that we do to make it easier to use RavenDB is to have a full suite of built-in tools to make pushing data to other destinations.

For example, you can define an ETL process that will push document changes from RavenDB (potentially transforming & filtering them) to a relational database, another RavenDB instance, a data lake / OLAP system and much more.

In RavenDB 5.3 we have added Elasticsearch as an ETL target for RavenDB. If you are familiar with RavenDB ETL processes, the behavior is pretty much the same as you would expect. You select which collections you want to push to Elasticsearch, you provide a script that filters and transform the data and then you are done. From that point on, it is RavenDB’s responsibility to keep the Elasticsearch target up to date with any changes that are happening inside of RavenDB.

I’ll discuss the exact details on how to make it work shortly, but first I want to talk a bit about the usage scenario for this. Elasticsearch, just like RavenDB, it using Lucene behind the scenes to implement indexes. Unlike RavenDB, however, Elasticsearch is all about… well, searching. In that context, there is a pretty big overlap between RavenDB and Elasticsearch. In fact, one of the primary reasons we see people selecting RavenDB is that they now don’t need to maintain multiple environments (one to store the data and an Elasticsearch cluster for searching on that), RavenDB is able to undertake both needs in a single highly integrated and performant package.

The most common scenario for Elasticsearch ETL is when you already have an existing investment in Elasticsearch. RavenDB will naturally integrate into your environment, without needing to make any significant changes. That can enable you to start running queries, Kibana dashboard, etc on your RavenDB documents.

Here is the transformation script:

image

And the configuration telling RavenDB where to go:

image

You can push multiple collections to multiple Elasticsearch indexes. It is important to note that you must include the RavenDB document Id as a property in the script and also set it in the destination index configuration. If the Elasticsearch index doesn't already exist, RavenDB will create it for you on the fly.

This is… pretty much it. The actual feature is fully fledged, of course. You get monitoring and tracking, it will run in high availability mode and will be assigned an owner node in the cluster, etc. If there is a failure on Elasticsearch, there is no data loss, RavenDB will wait for the target to come back up and push all the data that was changed in the meantime. The ETL process is an online process, which means that you can expect to see changes in RavenDB reflected in Elasticsearch index within a few milliseconds of the transaction commit.

This feature is available in the Professional and Enterprise editions of RavenDB and will be included in the RavenDB 5.3 released, scheduled for mid November.

time to read 8 min | 1413 words

I’ll start with saying that this is not something that is planned in any capacity, I run into this topic recently and decided to dig a little deeper. This post is mostly about results of my research.

If you run a file sharing system, you are going to run into a problem very early on. Quite a lot of the files that people are storing are shared, that is a good thing. Instead of storing the same file multiple times, you can store it once, and just keep a reference counter. That is how RavenDB internally deals with attachments, for example. Two documents that have the same attachment (content, not the file name, obviously) will only have a single entry inside of the RavenDB database.

When you run a file sharing system, one of the features you really want to offer is the ability to save space in this manner. Another one is not being able to read the users’ files. For many reasons, that is a highly desirable property. For the users, because they can be assured that you aren’t reading their private files. For the file sharing system, because it simplify operations significantly (if you can’t look at the files’ contents,  there is a lot that you don’t need to worry about).

In other words, we want to store the users’ file, but we want to do that in an encrypted manner. It may sound surprising that the file sharing system doesn’t want to know what it is storing, but it actually does simplify things. For example, if you can look into the data, you may be asked (or compelled) to do so. I’m not talking about something like a government agency doing that, but even feature requests such as “do virus scan on my files”, for example. If you literally cannot do that, and it is something that you present as an advantage to the user, that is much nicer to have.

The problem is that if you encrypt the files, you cannot know if they are duplicated. And then you cannot use the very important storage optimization technique of de-duplication. What can you do then?

This is where convergent encryption comes into play. The idea is that we’ll use an encryption system that will give us the same output for the same input even when using different keys. To start with, that sounds like a tall order, no?

But it turns out that it is quite easy to do. Consider the following symmetric key operations:

One of the key aspects of modern cryptography is the use of nonce, that ensures that for the same message and key, we’ll always get a different ciphertext. In this case, we need to go the other way around. To ensure that for different keys, the same content will give us the same output. Here is how we can utilize the above primitives for this purpose. Here is what this looks like:

In other words, we have a two step process. First, we compute the cryptographic hash of the message, and use that as the key to encrypt the data. We use a static nonce for this part, more on that later. We then take the hash of the file and encrypt that normally, with the secret key of the user and a random nonce. We can then push both the ciphertext   and the nonce + encrypted key to the file sharing service. Two users, using different keys, will always generate the same output for the same input in this model. How is that?

Both users will compute the same cryptographic hash for the same file, of course. Using a static nonce completes the cycle and ensures that we’re basically running the same operation. We can then push both the encrypted file and our encrypted key for that to the file sharing system. The file sharing system can then de-duplicate the encrypted blob directly. Since this is the identical operation, we can safely do that. We do need to keep, for each user, the key to open that file, encrypted with the user’s key. But that is a very small value, so likely not an issue.

Now, what about this static nonce? The whole point of a nonce is that it is a value that you use once. How can we use a static value here safely? The problem that nonce is meant to solve is that with most ciphers, if you XOR the output of two cipher texts, you’ll get the difference between them. If they were encrypted using the same key and nonce, you’ll get the result of XOR between their plain texts. That can have catastrophic impact on the security of the system. To get anywhere here, you need to encrypt two different messages with the same key and nonce.

In this case, however, that cannot happen. Since we use cryptographic hash of the content as the key, we know that any change in the message will ensure that we have a different key. That means that we never reuse the key, so there is no real point in using a nonce at all. Given that the cryptographic operation requires it, we can just pass a zeroed nonce and not think about it further.

This is a very attractive proposition, since it can amounts to massive space savings. File sharing isn’t the only scenario where this is attractive. Consider the case of a messaging application, where you want to forward messages from one user to another. Memes and other viral details are a common scenario here. You can avoid having to re-upload the file many times, because even in an end to end encryption model, we can still avoid sharing the file contents with the storage host.

However, this lead to one of the serious issues that we have to cover for convergent encryption. For the same input, you’ll get the same output. That means that if an adversary know the source file, it can tell if a user has that file. In the context of a messaging application, it can spell trouble. Consider the following image, which is banned by the Big Meat industry:

image

Even with end to end encryption, if you use convergent encryption for media files, you can tell that a particular piece of content is accessed by a user. If this is a unique file, the server can’t really tell what is inside it. However, if this is a file that has a known source, we can detect that the user is looking at a picture of salads and immediately snitch to Big Meat.

This is called configuration of file attack, and it is the most obvious problem for this scenario. The other issue is that you using convergent encryption, you may allow an adversary to guess about values in the face on known structure.

Let’s consider the following scenario, I have a system service where users upload their data using convergent encryption, given that many users may share the same file, we allow any user to download a file using:

GET /file?id=71e12496e9efe0c7e31332533cb53abf1a3677f2a802ef0c555b08ba6b8a0f9f

Now, let’s assume that I know what typical files are stored in the service. For example, something like this:

File:W4 completo ejemplo.jpg

Looking at this form, there are quite a few variables that you can plug here, right? However, we can generate options for all of those quite easily, and encryption is cheap. So we can speculate on the possible values. We can then run the convergent encryption on each possibility, then fetch that from the server. If we have a match, we figured out the remaining information.

Consider this another aspect of trying to do password cracking, and using a set of algorithms that are quite explicitly meant to be highly efficient, so they lend themselves to offline work.

That is enough on the topic, I believe. I don’t have any plans of doing something with that, but it was interesting to figure out. I actually started this looking at this feature in WhatsApp:

image

It seams that this is a client side enforced policy, rather than something that is handled via the protocol itself. I initially thought that this is done via convergent encryption, but it looks like just a counter in the message, and then you the client side shows the warning as well as applies limits to it.

time to read 4 min | 664 words

Recently we had to tackle a seriously strange bug. A customer reported that under a specific set of circumstances, when loading the database with many concurrent requests, they would get an optimistic concurrency violation from RavenDB.

That is the sort of errors that we look at and go: “Well, that is what you expect it to do, no?”

The customer code looked something like this:

As you can imagine, this code is expected to deal with concurrency errors, since it is explicitly meant to run from multiple threads (and processes) all at once.

However, in the scenario that the user was using, it had two separate types of workers running, one type was accessing the same set of locks, and there you might reasonably expect to get a concurrency error. However, the second type was run in the main thread only, and there should be no contention on that at all. However, the user was getting a concurrency error on that particular lock, which shouldn’t happen.

Looking at the logs, it was even more confusing. Leaving aside that we could only reproduce this issue when we had a high contention rate, we saw some really strange details. Somehow, we had a write to the document coming out of nowhere?

It took a while to figure out what was going on, but we finally figured out what the root cause was. RavenDB internally does not execute transactions independently. That would be far too costly. Instead, we use a process call transaction merging.

Here is what this looks like:

image

The idea is that we need to write to the disk to commit the transaction. That is an expensive operation. By merging multiple concurrent transactions in this manner, we are able to significantly reduce the number of disk writes, increasing the overall speed of the system.

That works great, until you have an error. The most common error, of course, is a concurrency error. At this point, the entire merged transaction is rolled back and we will execute each transaction independently. In this manner, we suffer a (small) performance hit when running into such an error, but the correctness of the system is preserved.

The problem in this case would only happen when we had enough load to start doing transaction merging. Furthermore, it would only happen if the actual failure would happen on the second transaction in the merged sequence, not the first one.

Internally, each transaction does something like this:

Remember that aside from merging transactions, a single transaction in RavenDB can also have multiple operations. Do you see the bug here?

The problem was that we successfully executed the transaction batch, but the next one in the same transaction failed. RavenDB did the right thing and execute the transaction batch again independently (this is safe to do, since we rolled back the previous transaction and are operating on a clean slate).

The problem is that the transaction batch itself, on the other hand, had its own state. In this case, the problem was that when we replied back to the caller, we used the old state, the one that came from the old transaction that was rolled back.

The only state we return to the user is the change vector, which is used for concurrency checks. What happened was that we got the right result, for the old change vector. Everything worked, and the actual state of the database was fine. The only way to discover this issue is if you are continue to make modifications to the same document on the same session. That is a rare scenario, since you typically discard the session after you save its changes.

In any other scenario, you’ll re-load the document from the server, which will give you the right change vector and make everything work.

Like all such bugs, when we look back at it, this is pretty obvious, but to get to this point was quite a long road.

time to read 1 min | 187 words

Following my previous post, which mentioned that you can save significantly on disk space if you store a plain text attachment using gzip, we go a feature request:

Perhaps in future attachments could have built-in compression as well?

The answer to that is no, but I thought that it is worth a post to explain why not.

Let’s consider the typical types of attachments that you’ll store in RavenDB. Based on experience, we usually see:

  • PDF files
  • Word / Excel / Power Point
  • Images (JPEG, PNG, GIF, etc)
  • Videoes
  • Designs (floor plans, CAD / DWG, etc)
  • Text files

Aside from the text files, pretty much all the data you’ll store as an attachment is already compressed. In fact, you’ll be hard pressed today to find any file format that does not already have built-in compression.

Compressing already compressed data is… suboptimal. I will not usually lead to significant space savings and can actually make the file size larger. It also burns CPU cycles unnecessarily.

It is better to shift the responsibility to the users in this case, since they have a lot more information about what they actually put into RavenDB and won’t have to guess.

time to read 3 min | 478 words

In distributed systems, the term Byzantine fault tolerance refers to working in an environment where the other nodes in the system are going to violate the invariants held by the system. Sometimes, that is because of a bug, sometimes because of a hardware issue (Figure 11) and sometimes that is a malicious action.

A user called us to let us know about a serious issue, they have a 100 documents in their database, but the index reports that there are 105 documents indexed. That was… puzzling. None of the avenues of investigation we tried help us. There wasn’t any fanout on the index, for example.

That was… strange.

We looked at the index in more details and we noticed something really strange. There were documents ids that were in the index that weren’t in the database, and they were all at the end. So we had something like:

  • users/1 – users/100 – in the index
  • users/101 – users/105 – not in the index

What was even stranger was that the values for the documents that were already in the index didn’t match the values from the documents. For that matter, the last indexed etag, which is how RavenDB knows what documents to index.

Overall, this is a really strange thing, and none of that is expected to happen. We asked the user for more details, and it turns out that they don’t have much. The error was reported in the field, and as they described their deployment scenario, we were able to figure out what was going on.

On certain cases, their end users will want to “reset” the system. The manner in which they do that is to shut down their application and then delete the RavenDB folder. Since all their state is in RavenDB, that brings them back up at a clean state. Everything works, and this is a documented manner in which they are working.

However, the manner in which the end user will delete is done as: “Delete the database files”, the actual task is done by the end user.

A RavenDB database internally is composed of a few directories, for data and indexes. What would happen in the case where you deleted just the database file, but kept the index files? In that case, RavenDB will just accept that this is the case (soft delete is a a thing, so we have to accept that) and open the index file. That index file, however, came from a different database, so a lot of invariants are broken. I’m actually surprised that it took this long to find out that this is problematic, to be honest.

We explained the issue, and then we spent some time ensuring that this will never be an invisible error. Instead, we will validate that the index is coming from the right source, or error explicitly. This is one bug that we won’t have to hunt ever again.

time to read 3 min | 451 words

A user called us with a strange bug report. He said that the SQL ETL process inside of RavenDB was behaving badly. It would write the data from the RavenDB server to the MySQL database, but then it would immediately delete it.

From the MySQL logs, the user showed:

2021-10-12 13:04:18 UTC:20.52.47.2(65396):root@ravendb:[1304]:LOG: execute <unnamed>: INSERT INTO orders ("id") VALUES ('Tab/57708-A')
2021-10-12 13:04:19 UTC:20.52.47.2(65396):root@ravendb:[1304]:LOG: execute <unnamed>: DELETE FROM orders WHERE "id" IN ('Tab/57708-A')

As you can imagine, that isn’t an ideal scenario for ETL processes. It is also something that should absolutely not happen. RavenDB issues the delete before the insert, obviously. In fact, for your viewing pleasure, here is the relevant piece of code:

image

It doesn’t get any clearer than that, right? We issue any deletes we have, then we send the inserts. But the user insisted that they are seeing this behavior, and I tend to trust them. But we couldn’t reproduce the issue, and the code in question dates to 2017, so I was pretty certain it is correct.

Then I noticed the user’s configuration. To define a SQL ETL process in RavenDB, you need to define what tables we’ll be writing to as well as deciding what data to send to those tables. Here is what this looked like:

image

And here is the script:

image

Do you see the problem? It might be better to use an overload, which make it clearer:

image

In the table listing, we had an “orders” table, but the script sent the data to the “Orders” table.

RavenDB is a case insensitive database, but in this case, what happened behind the scenes is that the code used a case sensitive dictionary to keep track of the tables that we are working with. That meant that what we did was roughly:

And basically, the changes_by_table dictionary had two tables in it. One from the table definitions and one from the script. When we validate that the tables from the script are fine, we do that using case insensitive comparison, so that passed properly.

To make it worst, the order of items in a dictionary is not predictable. If this was the iteration order other way around, everything would appear to be working just fine.

We fixed the bug, but I found it really interesting that three separate people (very experienced with the codebase) had a look and couldn’t figure out how this bug can happen. It wasn’t in the code, it was in what wasn’t there.

FUTURE POSTS

  1. Feature Design: ETL for Queues in RavenDB - 17 hours from now
  2. re: Why IndexedDB is slow and what to use instead - about one day from now
  3. Implementing a file pager in Zig: What do we need? - 3 days from now
  4. Production postmortem: The memory leak that only happened on Linux - 6 days from now
  5. Talk: Scalable architecture from the ground up - 7 days from now

And 6 more posts are pending...

There are posts all the way to Dec 22, 2021

RECENT SERIES

  1. Challenge (63):
    03 Nov 2021 - The code review bug that gives me nightmares–The fix
  2. Talk (6):
    23 Apr 2020 - Advanced indexing with RavenDB
  3. Production postmortem (32):
    17 Sep 2021 - The Guinness record for page faults & high CPU
  4. re (29):
    23 Jun 2021 - The performance regression odyssey
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats