Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 345 words

After we built the SQL Migration Wizard for RavenDB 4.1, we started to field questions about assistance in migrating from more databases. As a result of this, we have introduced a support for MongoDB and CosmosDB migration.

I’m going to walk you through how this works. First, you’ll need to download the Release Candidate of RavenDB 4.1. In addition to the zip package, you’ll need to download the Tools zip file as well.

Next, run RavenDB and create a new database, then go to Settings > Import Data and select From other. Here is what this will look like:

image

I went to shodan.io and found a publicly available MongoDB server to test this out. The process completed successfully and gave me a single document:

image

I guess someone already got to this instance.

More seriously, I scanned literally the first page in this listing and was able to connect and retrieve real documents from a few of them. That included what looked like users (hashed) passwords, among other details. I deleted the data

At any rate, you can use this wizard to pull data from a MongoDB instance to your RavenDB database. RavenDB will handle all the details of the transfer for you. There is even the option to use a transformation script to shape the data as it goes into RavenDB.

You can do the same for CosmosDB, as you can see below:

image

These credentials I got from doing a GitHub search.

On the one hand, I’m really happy that the feature works. On the other hand, I’m pretty much in a state of despair from the state of security in general.

We are looking into other databases that our users want to migrate from. If you have such a need, please let us know.

time to read 2 min | 374 words

Subscriptions in RavenDB allow you to build persistent queries, batch operations and respond immediately to changes in your data. You can read more about them in this post, and I have dedicated a full chapter to discussing them in the book.

In RavenDB 4.1 we improved subscription support by adding the ability to include related documents directly as part of the subscription. Consider the following subscription:

image

The output of this subscription is going to be orders where the geographical coordinates of the subscriptions are not known. We use that to enrich the data by adding the missing location data from the shipping address. This is important for features such as spatial searches, planning deliveries, etc.  For the purpose of this post, we’ll assume that we accept user’s addresses, which do not have spatial information on them and we use a background process to fill them in.

On the one hand, we do want to add the location data as soon as possible but on the other hand, we want to avoid making too many address resolution requests, to avoid having to pay for them. Here is what we come up with to resolve this.

You can see that there are a bunch of interesting things in this code:

  • We can access the related company on the order using Load, and it will not trigger any network call.
  • If the company already has this address, we can use the known location, without having to make an expensive geo-location call.
  • If the company doesn’t have an address, we’ll fill this in for the next run.
  • If the company’s address doesn’t have location, only then we’ll make a remote call to get the actual location data.
  • We don’t call Store() anywhere, because we are using the session instance from the batch, when we call SaveChanges(), all the changes to the loaded documents (either orders or companies) will be saved as a single transaction.
  • Because we update the Location field on the order, we won’t be called again with the updated order, since the subscription filters this.

All in all, this is a pretty neat way to handle the scenario in a very efficient manner.

time to read 1 min | 190 words

Queries are one of the most important things you do in RavenDB, and analyzing queries to understand their costs is an important step in many optimization tasks.

I’m proud to say that for the most part, you don’t need to do this very often with RavenDB. Usually the query optimizer just works and give you query speed that is fast enough that you don’t care how this is achieved. With RavenDB 4.1, we made it even easier to figure out what are the costs of the query. Here is a good example:

image

With the inclusion of “include timings()”, RavenDB will provide you with detailed stats on the costs of the relevant pieces in the query. The output in the studio looks like this:

image

You can see in a glance exactly how much time RavenDB spent on each part of your query. Armed with that information, you can set out to improve things a lot faster (pun intended).

time to read 2 min | 355 words

imageWe put a lot of effort into making RavenDB’s default setup a secured one. The effort wasn’t so much about securing RavenDB itself although we certainly spent a lot of time on that. Instead, a lot of the work went into making sure that the security will be usable. In other words. If you build a lock that no one can open, that isn’t a good lock, it is a horrible one. No one will use it. Indeed, the chief challenge in the design of the security mechanisms in RavenDB was making sure that they are secure and usable.

I think that we hit the right spot for the most part. You can see that it take barely any effort to setup a secured RavenDB cluster in a few minutes. That works, if you are setting up RavenDB yourself. But what happens when you need an unattended setup? What happen if you are building a Docker container? What happens if you aren’t setting up a single RavenDB instance, but three hundreds of them?

There are ways to handle this, by making sure that you are preparing the certificates and configuration ahead of time. But that is complex, and you still end up with the chicken and egg problem. How do you create a new node securely in such a way that it will know that you can trust it?

The feature itself is pretty small:

--Security.WellKnownCertificates.Admin=a909502dd82ae41433e6f83886b00d4277a32a7b

You can pass this argument as part of the RavenDB command line, in the settings.json file, in a Docker environment variable, etc.

The idea is simple. This tell RavenDB that the certificate matching this well known thumbprint is an administrator. As such, it will be trusted to perform all operations. A simple use case may be spawning three containers with this setting and using the well known certificate to connect them together and create a full cluster out of them.

For automatic deployment, this issue keeps popping up, because setting up properly is hard. We hope that this feature will make things easier.

time to read 2 min | 286 words

imageA fun feature we have back in RavenDB is the ability to run RavenDB as part of your own application. Zero deployment, setup or hassle.

The following is the list of steps you will need:

  • Create a new project (.NET Core, .Net Framework, whatever).
  • Grab the pre-release bits from MyGet.
    Install-Package RavenDB.Embedded -Version 4.1.0 -Source https://www.myget.org/F/ravendb/api/v3/index.json
  • Ask the embedded client for a document store, and start working with documents.

And here is the code:

This is it. Your process has a dedicated RavenDB server running which will run alongside your application. When your process is done, RavenDB will shutdown and nothing is running. You don’t need to setup anything, install services or configure permissions.

For extra fun (and reliability), RavenDB is actually running as a separate process. This means that if you are debugging an application that is using an embedded RavenDB, you can stop in the debugger and open the studio, inspecting the current state of the database, running queries, etc.

You have the full capabilities of RavenDB at your disposal. This means being able to use the studio, replicate to other nodes, backup & restore, database encryption, the works. The client API used is the usual one, which means that if you need to switch from running in embedded to server mode, you only need to change your document store initialization and you are done.

We are also working now on bringing this feature to additional platforms. You’ll soon be able to write the following:

This is a Python application that is using RavenDB in embedded mode, and you’ll be able to do so across the board (Java, Node.js, Go, Ruby, etc).

time to read 2 min | 204 words

One of the features that RavenDB exposes is the ability to get results in match order. In other words, you can write a query like this:

from Employees 
where 
    boost(Address.City = 'London', 10) 
or  boost(Address.City = 'Seattle', 5)
order by score()
include explanations()

In other words, find me all the employees in London or Seattle, but I want to get the London employees first. This is a pretty simple example. But there are cases where you may involve multiple clauses, full text search, matches on arrays, etc. Figuring out why the documents came back in the order that they did can be complex.

Luckily, RavenDB 4.1 give you a new feature just for that. Look at the last line in the query: “include explanations()”. This will tell RavenDB that you want to dig into the actual details of the query, and in the user interface you’ll get:

image

And the full explanation for each is:

image

time to read 5 min | 903 words

imageOne of the major features coming up in RavenDB 4.1 is the ability to do a cluster wide transaction. Up until this point, RavenDB’s transactions were applied at each node individually, and then sent over to the rest of the cluster. This follows the distributed model outlined in the Dynamo paper. In other words, writes are important, always  accept them. This works great for most scenarios, but there are a few cases were the user might wish to explicitly choose consistency over availability. RavenDB 4.1 brings this to the table in what I consider to be a very natural manner.

This feature builds on the already existing compare exchange feature in RavenDB 4.0. The idea is simple. You can package a set of changes to documents and send them to the cluster. This set of changes will be applied to all the cluster nodes (in an atomic fashion) if they have been accepted by a majority of the nodes in the cluster. Otherwise, you’ll get an error and the changes will never be applied.

Here is the command that is sent to the server.

image

RavenDB ensures that this transaction will only be applied after a majority confirmation. So far, that is nice, but you could do pretty much the same thing with write assurance, a feature RavenDB has for over five years. Where it gets interesting is the fact that you can make the operation in the transaction conditional. They will not be executed unless a certain (cluster wide) state has an expected value.

Remember that I said that cluster wide transactions build upon the compare exchange feature? Let’s see what we can do here. What happens if we wanted to state that a user’s name must be unique, cluster wide. Previously, we had the unique constraints bundle, but that didn’t work so well in a cluster and was removed in 4.0. Compare exchange was meant to replace it, but it was hard to use it with document modifications, because you didn’t have a single transaction boundary. Well, now you do.

Let’s see what I mean by this:

As you can see, we have a new command there: “ClusterTransaction.CreateCompareExchangeValue”. This is adding another command to the transaction. A compare exchange command. In this case, we are saying that we want to create a new value named “usernames/Arava” and set its value to the document id.

Here it the command that is sent to the server:

image

At this point, the server will accept this transaction and run it through the cluster. If a majority of the nodes are available, it will be accepted. This is just like before. The key here is that we are going to run all the compare exchange commands first. Here is the end result of this code:

image

We add both the compare exchange and the document (and the project document not shown) here as a single operation.

Here is the kicker. What happen if we’ll run this code again?

You’ll get the following error:

Raven.Client.Exceptions.ConcurrencyException: Failed to execute cluster transaction due to the following issues: Concurrency check failed for putting the key 'usernames/Arava'. Requested index: 0, actual index: 1243

Nothing is applied and the transaction is rolled back.

In other words, you now have a way to provide consistent concurrency check cluster wide, even in a distributed system. We made sure that a common scenario like uniqueness checks would be trivial to implement. The feature allows you to do in-transaction manipulation of the compare exchange values and ensure that document changes will only be applied if all the compare exchange operations (and you have more than one) have passed.

We envision this being used for uniqueness, of course, but also for high value operations where consistency is more important than availability. A good example would be creating an order for a seat in a play. Multiple customers might try to purchase the same seat at the same time, and you can use this feature to ensure that you don’t double book it*. If you manage to successfully claim the seat, your order document is updated and you can proceed. Otherwise, the whole thing rolls back.

This can significantly simplify workflow where you might have failure mid operation, by giving you transactional guarantee around the whole cluster.

A cluster transaction can only delete or put documents, you cannot use a patch. This is because the result of the cluster transaction must be self contained and repeatable. A document modified by a cluster transaction may also take part in replication (including external replication). In fact, documents modified by cluster transactions behave just like normal documents. However, conflicts between documents modified by cluster transactions and modifications that weren’t made by cluster transaction are always resolved in favor of the cluster transactions modifications. Note that there can never be a conflict between modifications on cluster transactions. They are guaranteed proper sequence and ordering by the nature of running them through the consensus protocol.

* Yes, I know that this isn’t how it actually work, but it is a nice example.

time to read 2 min | 277 words

One of the things that we do in RavenDB is try to expose as much as possible the internal workings and logic inside RavenDB. In this case, the relevant feature we are trying to expose is the inner working of the query optimizer.

Consider the following query, running on a busy system.

image

This will go to query optimizer, that needs to select the appropriate index to run this query on. However, this process is somewhat of a black box from the outside. Let me show you how RavenDB externalize that decision.

image

You can see that there were initially three index candidates for this. The first one doesn’t index FirstName, so it was ruled out immediately. That gave us a choice of two suitable indexes.

The query optimizer selected the index that has the higher number of fields. This is done to route queries from narrower indexes so they will be retired sooner.

This is a simple case, there are many other factors that may play into the query optimizer decision, such as when an index is stale because it was just created. The query optimizer will then choose another index until the stale index catch up with all its work.

To be honest, I mostly expect this to be of use when we explain how the query optimizer work. Of course, if you are investigating “why did you use this index and not that one” in production, this feature is going to be invaluable.

time to read 2 min | 238 words

This s actually an old feature, that didn’t make the cut to enter 4.0. This is now back, and it is roaring. This is the kind of feature that is useful if you are utilizing RavenDB’s search capabilities. Let us assume that you want to search for something, but instead of querying for “give me all the active users” you want to actually… search. For example, you want to search for all employees with a BA in their bio. However, you don’t want to just get the matches, you want to show the user why this was matches.

That is the problem that highlighting is meant to solve. Consider the following query:

image

Which returns the following results:

image

Why did we get this particular employees?  Let’s find out:

image

Now we are asking the server to highlight for us the reason for the match. You can see this in the studio directly, in the Highlight tab:

image

Using this approach, you can enrich the search result and provide nicer experience for your users.

time to read 3 min | 501 words

imageDocuments are awesome, they allow you to model your data in a very natural way. At the same time, there are certain things that just don’t fit into the document model.

Consider the simple case of counting. This seems like it would be very obvious, right? As simple as 1+1. However, you need to also consider concurrency and distribution. Look at the image on the right. What you can see there is a document describing a software release. In addition to tracking the features that are going into the release, we also want to count various statistics about the release. In this example, you can see how many times a release was downloaded, how many times it was rated, etc.

I’ll admit that the stars rating is a bit cheesy, but it looks good and actually test that we have good Unicode support Smile.

Except for a slightly nicer way to show numbers on the screen, what does this feature gives you? It means that RavenDB now natively understand how to count things. This means that you can increment (or decrement) a value without modifying the whole document. It also means that RavenDB will be able to automatically handle concurrency on the counters, even when running in a distributed system. This make this feature suitable for cases where you:

  • want to increment a value
  • don’t care (and usually explicitly desire) concurrency
  • may need to handle very large number of operations

The case of the download counter or the rating votes is a classic example. Two separate clients may increment either of these values at the same time a third user is modifying the parent document. All of that is handled by RavenDB, the data is updated, distributed across the cluster and the final counter values are tallied.

Counters cannot cause conflicts and the only operation that you are allowed to do to them is to increment / decrement the counter value. This is a cumulative operation, which means that we can easily handle concurrency at the local node or cluster level by merging the values.

Other operations (deleting a counter, deleting the parent document) are of course non cumulative, but are much rarer and don’t typically need any sort of cooperative concurrency.

Counters are not standalone values but are strongly associated with their owning document. Much like the attachments feature, this means that you have a structured way to add additional data types to you documents. Use counters to, well… count. Use attachments to store binary data, etc. You are going to see a lot more of this in the future, since there are a few things in the pipeline that we are already planning to add.

You can use counters as a single operation (incrementing a value) or in a batch (incrementing multiple values, or even modifying counters and documents together). In all cases, the operation is transactional and will ensure full ACIDity.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}