Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 2 min | 303 words

RavenDB has the notion of Custom Sorters, basically, we allow you to inject your own logic into the sorting process. That allows you to run any complex logic you have around sorting.  There are rarely good reasons to want to use that. A good use case for that is when you need to sort by an external value that mutates outside of your control. Let’s say that you have invoices in multiple currencies. You want to sort them by their value in USD. The catch, you need them sorted on the current exchange rate. For that reason, you can use the custom sorter that would use the current value of the currency as the sorting mechanism.

I should point out that from a business perspective, you’ll typically want to use the value that you had for the order at the time the order was made, but that is not related the the custom sorting feature.

Let’s take another example, however. Consider the following Enum:

We want to sort by the education level of our candidates, but by default, we’ll be sorting using the textual value of the field. That isn’t what we want. We can define a custom sorter for that, but there is a far better option, just tell us what the order should be in the index.

Here is a good example:

What we are doing here is simple, we translate the textual value to a numeric one. When we query the index, we can filter by the textual value and sort by the sort value, giving us what we want. This is far simpler and more robust. If you need to add additional values down the line, it is obvious where they need to go. A custom sorter, on the other hand, is far more capable, but also more complex to operate.

time to read 2 min | 306 words

RavenDB is a database, not a queue or a service bus. That said, you can make use of RavenDB subscriptions to get a very similar behavior to a service bus. Let’s see how much effort it will take us to implement backend processing using RavenDB only.

We assume that we have commands or messages, that are written to the Commands collection and are handled via a subscription (which may have multiple concurrent workers). In terms of your messaging models, we have:

The CommandBase we have here defines the following infrastructure properties:

  • Status – enum [Initial, Processing, Failed, Completed] – default value is Initial
  • RetriesCount – int – default value is 3
  • Error – string – null by default

We can now define our subscription using the following query:

from Commands as c
where c.RetriesCount > 0  and c.Status != 'Completed' and c.’@metadata’.’@refresh’ == null

This query is pretty simple, but it allows me to get all the documents that haven’t exceeded their retry count. The @refresh option allows me to register a command to be executed at a later point in time. See the documentation here, this is a feature that exists specifically to allow you to schedule commands with subscriptions.

In my subscription workers, I can now execute:

The code above is sufficient to get most of the way toward a robust message handling system.

I can easily see what messages are being processing, I can see how long they take, etc. I can see what failed and why. And I can see the history of commands.

That handles scenarios such as error handling and retries, introspection on the state of the system and you can derive from here all the relevant numbers on throughput, capacity, etc.

It isn’t a complete solution, but for very little code, you can take this quite a long way.

time to read 2 min | 313 words

Enabling RavenDB’s revisions allows you to ask RavenDB to keep immutable copies of a document. We originally envisioned this feature as a way to have easy audit trails and a time travel feature. Revisions were meant to be something that you’ll typically access as the administrator, not something that we expected to be used in normal course of events.

Usage in the field showed that users often want to make revisions a core part of the domain model. We have a user that uses revisions to mark the Approved (and thus, locked) version of a Plan document, for example. Another example is a payroll processing system where the Contract for a particular employee isn’t pointing to a document, but to a specific revision of that contract. Modifications to the contract have no impact on the employee unless they are explicitly moved to a new version of the contract.

Seeing all those use cases popping up for revisions, we added more features around revisions to make it easier to work with them (such as allowing to explicitly create a revision on command or enforcing revisions policies after the fact).

In RavenDB 5.3 we have added the ability to include revisions as part of the query, so you get the same benefit of reduction in the number of remote calls as you get when working with document references. Let’s say that I want to calculate payroll for a set of employees, here is how I can do that (the Contract property contains the change vector of the specific version of the contract):

image

In terms of API, this is how you use it:

This smooths out a scenario where you had to deal with making multiple remote calls into a single one for the whole process. Just the kind of improvement that I like to see.

time to read 1 min | 150 words

JSON Patch is a feature that allows the frontend to send a set of changes on documents. If you are working with complex documents, that can result in a significant reduction in bandwidth. There are many scenarios where the client can modify a document on the browser, then produce the JSON Patch to make the server match the changes.

In RavenDB 5.3, we added direct support for implementing JSON Patch inside of RavenDB. The frontend code can forward the patch operations directly to RavenDB, where they will be executed by the database. Concurrent patches on the same document aren’t going to contend with one another and will be processed correctly. For example, in this post I’m using patch scripts to modify a document. I can do the same using JSON patch as well.

Here is how you can use the new ability:

Small improvement, but can make some scenarios much easier.

time to read 3 min | 424 words

I like to think about myself as a database guy. My go to joke about building user interfaces is that a <table> is all I need for layout (it’s not a joke). About a decade ago I just gave up on trying to follow what is going on in the frontend land and accepted that I’ll reside in the backend from here on after.

Being ignorant of the ways you’ll write a modern frontend doesn’t affect the fact that I like to use a good user interface. I have seriously mixed feelings about the importance of RavenDB Studio to the project. On the one hand, I care that it is easy to use, obvious and functional. I love that it is beautiful and will generally make your life easier. And at the same time, I abhor the fact that it has such an impact on people’s decisions. I mean, the backend of RavenDB is absolutely beautiful, from a technical perspective. But everyone always talk about the studio.

Leaving aside my mini rant, we spend quite a lot of time and effort on the studio and the User Experience in general. This release is not an exception and we have a couple of major new updates to the studio.

One of the most common things you’ll do in the studio is run queries. In this release we have done a complete revamp of the automatic code completion for the client-side RQL queries written in the studio.
The new code assistance is available when writing any query in the Query view, Patch view, and in the Subscription Query. That was actually quite interesting, from a computer science perspective. We have formal grammar for RQL now, for example, which means that we can provide much better experience for query editing. For example, take a look:

image

Full code completion assistance and better error handling directly at the studio makes it easier to work with RavenDB for both developers and operations.

The second feature is the Identities page:

image

Identities has been a feature in RavenDB for a long time, and somehow they have never been front and center. Maybe the discoverability of the feature suffered? You can now create, edit and modify the identities directly in the studio, not just through the API.

image

time to read 2 min | 378 words

mono-010-presentation-compressedMost of the time, you’ll communicate with RavenDB using HTTP, making REST calls. When you are doing that, you can take advantage of request compression. If the client indicates that it is able to by sending a Content-Encoding: gzip, RavenDB will send the data to you compressed. Given that we are working with JSON texts, which compress very well, we are looking at pretty significant savings in network bandwidth. This has been the case for RavenDB for many years (I didn’t check, but at least a decade, I believe).

There are certain cases, however, where RavenDB will use a binary protocol instead of HTTP. Those are usually scenarios where we are communicating directly with another RavenDB instance. All internal communications between RavenDB nodes will use direct TCP connections and when using Subscriptions, the client will open a TCP connection for the server and use that on a long term basis.

One of the fallacies of distributed computing is that bandwidth is infinite. One of the realities of cloud computing, on the other hand, is that you are paying for bandwidth. Even when you are running inside the same cloud region, cross availability zone network traffic is still charged. As you can imagine, on active systems, you may notice that you are spending a lot of bandwidth on inter cluster communication.

With RavenDB 5.3, we have added compression support for the replication and subscription connections . That means that replication and subscriptions will default for compressing the data. We are using the Zstd algorithm. In our tests, it produced both a higher compression ratio and faster performance than GZip. You don’t have to do anything for this to work (although there is a configuration option "Server.Tcp.Compression.Disable" to disable that if you really want to). When you upgrade to RavenDB 5.3, the cluster will automatically start compressing all traffic.

In our tests, we are seeing 85% (!) reduction in the amount of network traffic that we send out. That is something that I’m very much looking to seeing in our metrics once this is rolled out completely.

This is a RavenDB 5.3 feature (expected mid November) and will be available in the Professional and Enterprise editions of RavenDB.

time to read 3 min | 499 words

imageRavenDB is an OLTP database, it is meant to be the backend of business applications. There are some features in RavenDB that are meant for reporting purposes, but that is quite explicitly not our main focus. That is part of why RavenDB has such good integration with the rest of the environment, to give you the ability to use the best tool for the job.

With RavenDB 5.3, we are now allowing you to integrate directly with Power BI, so you can pull data from RavenDB to Power BI, write reports and in general utilize the full power of Power BI with your RavenDB data.

The image on the right, for example, is a report generated inside of Power BI on top of the sample data from RavenDB.

As you can imagine, I’m particularly stoked about this feature. Not only does it make reporting integration with RavenDB a lot simpler, the way that we do it is quite interesting. Instead of diving to the technical details, it would probably be more fun to show you how it works, from the perspective of Power BI.

The first thing we need to do is connect Power BI to RavenDB, using your existing Power BI system, you can simple click on Get Data and select:

image

You’ll then need to provide the connection details:

image

You’ll then be presented with the following dialog:

image

As you can see, we are translating the JSON documents inside of RavenDB to a columnar format for ease of processing inside of Power BI.

You can even take this further and issue RQL queries directly from inside of Power BI and transform the data. That means that you can utilize indexes, map/reduce operations, etc. Take a look:

image

And the result inside of Power BI:

image

As you can imagine, this is going to be a powerful tool in how you can work with your RavenDB data. You can also take this further and integrate that with Power BI on Azure, of course.

The way this works behind the scene is that we can now understand PostgreSQL wire protocol. That means that we can now be accessed from anywhere that can connect to Postgres. While the PostgreSQL protocol implementation is still marked as experimental, we have put the Power BI integration through its paces and we consider that stable enough for regular use.

Happy reporting Smile.

This feature is part of the RavenDB 5.3 release (expected in mid November) and is available in the Enterprise edition of RavenDB.

time to read 2 min | 273 words

imageNext week is Black Friday, which has reached a global phenomenon status. It is a fun day for shoppers, and a nervous wreck for IT admins everywhere. It is not uncommon to see traffic doubles or triples and the actual load (processing more heavyweight requests) can go up an order of magnitude. Preparing for Black Friday can be a harrowing issue since you have a narrow window of opportunity and it is hard to know exactly where the stress points are.

This year, I decided to make your life easier, and RavenDB is offering a Black Friday Surge to all our customers. No, we aren’t offering you 50% off and everything must go. What we do instead is try to be of help.

This Black Friday (and Cyber Monday as well), we are offering all our customers double what they paid for. When running RavenDB on premise, if you purchased a RavenDB license for a 12 cores cluster (running on 3 nodes of 4 cores each), we’ll offer you 30 days of double the core count. In other words, you can scale your system to be twice as powerful, and it won’t cost you a cent.

On the cloud, as well, we will provide users with credits to upgrade their clusters to the next level up (doubling their power) for a full week during the next 30 days. Again, there is no extra cost here.

You can register for the Surge here to request the upgrade and you’ll get twice as much power to handle the increased load.

Enjoy the power up!

time to read 4 min | 649 words

imageA really nice feature that we have in RavenDB 5.3 is support for wire protocol compatibility with PostgreSQL. That opens up RavenDB to the entire PostgreSQL ecosystem. You are now able to connect to a RavenDB instance using the tools such as psql, Npgsql, etc. This feature is both surprisingly simple and incredibly complex at the same time.

The actual wire protocol from Postgres is well documented and pretty clean. Doing a clean room implementation of that is a straightforward process. Adding that to RavenDB, on the other hand, led to a number of interesting challenges. To start with, the protocol assumes, at the wire level, that all the rows that you return for a query have the same structure. This is a very reasonable assumption to make for a relation database protocol, but it doesn’t hold true when you are talking about a schema-less database such as RavenDB.

Then there is the fact that clients will generate all sort of pretty scary queries before you even get to running a user’s query. For example, take a look at how Npgsql is detecting the capabilities of the database that it connects to. Just supporting the wire protocol isn’t sufficient, you also need to support quite a bit of additional behavior.

When we implemented this feature, we decided that we’ll support the wire protocol, so you’ll be able to connect, issue queries and get results. However, the query language itself is going to be RQL. We aren’t attempting to pretend that we are a PostgreSQL instance to the outside world, only implement enough to make integration and compatibility work.

Here is an example of running a query through the Postgres integration.

image

This is an experimental feature, mind you. It is showing a lot of promise, but we want to get some more feedback from our users about which ways we should take it. The feature opens up many doors, but it is also bringing with it a non trivial amount of complexity.

This feature requires that we’ll open up another port to the world, this is something that we require the user to explicitly allow. To enable this feature, you’ll need to set the following options in the settings.json configuration file:

"Integrations.PostgreSQL.Enabled": true
"Features.Availability" : "Experimental"

You can also control which port it will use using the Integrations.PostgreSQL.Port configuration option. We default to 5433 if none is specified.

At the current time, we only allow to issue queries and not modify data using the Postgres integration. This is something that we would very much like more feedback on, what kind of scenarios would you like to have where write scenario is supported? What kind of writes do you expect to have at that point?

Finally, a word about security. The PostgreSQL protocol supports using TLS for encryption. When running in insecure mode, RavenDB will reject SSL/TLS connections from Postgres client. When running in a secured mode (the default), the same server certificate that is used inside of RavenDB will also be used for the Postgres connection. However, while we usually require that the other side also authenticates using a client certificate, in the case of Postgres connection, we run into a problem. There are quite a few scenarios where we found out that while the Postgres protocol supports mutual authentication using client certificates, clients aren’t supporting it.

For that reason, we are allowing user & password authentication (on top of TLS connection, obviously) for the Postgres connections at this time. Note that there is no correlation between the Postgres login and the access to any other RavenDB features (where client certificates is the only option).

This is part of the RavenDB 5.3 release (expected in mid November) and will be available in the Professional and Enterprise editions.

time to read 3 min | 564 words

imageRavenDB tries to be a good neighbor in your systems. RavenDB is typically used in polyglot solutions and we are often brought in to existing ecosystems. One of the things that we do to make it easier to use RavenDB is to have a full suite of built-in tools to make pushing data to other destinations.

For example, you can define an ETL process that will push document changes from RavenDB (potentially transforming & filtering them) to a relational database, another RavenDB instance, a data lake / OLAP system and much more.

In RavenDB 5.3 we have added Elasticsearch as an ETL target for RavenDB. If you are familiar with RavenDB ETL processes, the behavior is pretty much the same as you would expect. You select which collections you want to push to Elasticsearch, you provide a script that filters and transform the data and then you are done. From that point on, it is RavenDB’s responsibility to keep the Elasticsearch target up to date with any changes that are happening inside of RavenDB.

I’ll discuss the exact details on how to make it work shortly, but first I want to talk a bit about the usage scenario for this. Elasticsearch, just like RavenDB, it using Lucene behind the scenes to implement indexes. Unlike RavenDB, however, Elasticsearch is all about… well, searching. In that context, there is a pretty big overlap between RavenDB and Elasticsearch. In fact, one of the primary reasons we see people selecting RavenDB is that they now don’t need to maintain multiple environments (one to store the data and an Elasticsearch cluster for searching on that), RavenDB is able to undertake both needs in a single highly integrated and performant package.

The most common scenario for Elasticsearch ETL is when you already have an existing investment in Elasticsearch. RavenDB will naturally integrate into your environment, without needing to make any significant changes. That can enable you to start running queries, Kibana dashboard, etc on your RavenDB documents.

Here is the transformation script:

image

And the configuration telling RavenDB where to go:

image

You can push multiple collections to multiple Elasticsearch indexes. It is important to note that you must include the RavenDB document Id as a property in the script and also set it in the destination index configuration. If the Elasticsearch index doesn't already exist, RavenDB will create it for you on the fly.

This is… pretty much it. The actual feature is fully fledged, of course. You get monitoring and tracking, it will run in high availability mode and will be assigned an owner node in the cluster, etc. If there is a failure on Elasticsearch, there is no data loss, RavenDB will wait for the target to come back up and push all the data that was changed in the meantime. The ETL process is an online process, which means that you can expect to see changes in RavenDB reflected in Elasticsearch index within a few milliseconds of the transaction commit.

This feature is available in the Professional and Enterprise editions of RavenDB and will be included in the RavenDB 5.3 released, scheduled for mid November.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}