Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:

oren@ravendb.net

+972 52-548-6969

Posts: 7,246 | Comments: 50,412

Privacy Policy Terms
filter by tags archive
time to read 2 min | 313 words

Enabling RavenDB’s revisions allows you to ask RavenDB to keep immutable copies of a document. We originally envisioned this feature as a way to have easy audit trails and a time travel feature. Revisions were meant to be something that you’ll typically access as the administrator, not something that we expected to be used in normal course of events.

Usage in the field showed that users often want to make revisions a core part of the domain model. We have a user that uses revisions to mark the Approved (and thus, locked) version of a Plan document, for example. Another example is a payroll processing system where the Contract for a particular employee isn’t pointing to a document, but to a specific revision of that contract. Modifications to the contract have no impact on the employee unless they are explicitly moved to a new version of the contract.

Seeing all those use cases popping up for revisions, we added more features around revisions to make it easier to work with them (such as allowing to explicitly create a revision on command or enforcing revisions policies after the fact).

In RavenDB 5.3 we have added the ability to include revisions as part of the query, so you get the same benefit of reduction in the number of remote calls as you get when working with document references. Let’s say that I want to calculate payroll for a set of employees, here is how I can do that (the Contract property contains the change vector of the specific version of the contract):

image

In terms of API, this is how you use it:

This smooths out a scenario where you had to deal with making multiple remote calls into a single one for the whole process. Just the kind of improvement that I like to see.

time to read 1 min | 150 words

JSON Patch is a feature that allows the frontend to send a set of changes on documents. If you are working with complex documents, that can result in a significant reduction in bandwidth. There are many scenarios where the client can modify a document on the browser, then produce the JSON Patch to make the server match the changes.

In RavenDB 5.3, we added direct support for implementing JSON Patch inside of RavenDB. The frontend code can forward the patch operations directly to RavenDB, where they will be executed by the database. Concurrent patches on the same document aren’t going to contend with one another and will be processed correctly. For example, in this post I’m using patch scripts to modify a document. I can do the same using JSON patch as well.

Here is how you can use the new ability:

Small improvement, but can make some scenarios much easier.

time to read 3 min | 424 words

I like to think about myself as a database guy. My go to joke about building user interfaces is that a <table> is all I need for layout (it’s not a joke). About a decade ago I just gave up on trying to follow what is going on in the frontend land and accepted that I’ll reside in the backend from here on after.

Being ignorant of the ways you’ll write a modern frontend doesn’t affect the fact that I like to use a good user interface. I have seriously mixed feelings about the importance of RavenDB Studio to the project. On the one hand, I care that it is easy to use, obvious and functional. I love that it is beautiful and will generally make your life easier. And at the same time, I abhor the fact that it has such an impact on people’s decisions. I mean, the backend of RavenDB is absolutely beautiful, from a technical perspective. But everyone always talk about the studio.

Leaving aside my mini rant, we spend quite a lot of time and effort on the studio and the User Experience in general. This release is not an exception and we have a couple of major new updates to the studio.

One of the most common things you’ll do in the studio is run queries. In this release we have done a complete revamp of the automatic code completion for the client-side RQL queries written in the studio.
The new code assistance is available when writing any query in the Query view, Patch view, and in the Subscription Query. That was actually quite interesting, from a computer science perspective. We have formal grammar for RQL now, for example, which means that we can provide much better experience for query editing. For example, take a look:

image

Full code completion assistance and better error handling directly at the studio makes it easier to work with RavenDB for both developers and operations.

The second feature is the Identities page:

image

Identities has been a feature in RavenDB for a long time, and somehow they have never been front and center. Maybe the discoverability of the feature suffered? You can now create, edit and modify the identities directly in the studio, not just through the API.

image

time to read 2 min | 378 words

mono-010-presentation-compressedMost of the time, you’ll communicate with RavenDB using HTTP, making REST calls. When you are doing that, you can take advantage of request compression. If the client indicates that it is able to by sending a Content-Encoding: gzip, RavenDB will send the data to you compressed. Given that we are working with JSON texts, which compress very well, we are looking at pretty significant savings in network bandwidth. This has been the case for RavenDB for many years (I didn’t check, but at least a decade, I believe).

There are certain cases, however, where RavenDB will use a binary protocol instead of HTTP. Those are usually scenarios where we are communicating directly with another RavenDB instance. All internal communications between RavenDB nodes will use direct TCP connections and when using Subscriptions, the client will open a TCP connection for the server and use that on a long term basis.

One of the fallacies of distributed computing is that bandwidth is infinite. One of the realities of cloud computing, on the other hand, is that you are paying for bandwidth. Even when you are running inside the same cloud region, cross availability zone network traffic is still charged. As you can imagine, on active systems, you may notice that you are spending a lot of bandwidth on inter cluster communication.

With RavenDB 5.3, we have added compression support for the replication and subscription connections . That means that replication and subscriptions will default for compressing the data. We are using the Zstd algorithm. In our tests, it produced both a higher compression ratio and faster performance than GZip. You don’t have to do anything for this to work (although there is a configuration option "Server.Tcp.Compression.Disable" to disable that if you really want to). When you upgrade to RavenDB 5.3, the cluster will automatically start compressing all traffic.

In our tests, we are seeing 85% (!) reduction in the amount of network traffic that we send out. That is something that I’m very much looking to seeing in our metrics once this is rolled out completely.

This is a RavenDB 5.3 feature (expected mid November) and will be available in the Professional and Enterprise editions of RavenDB.

time to read 3 min | 499 words

imageRavenDB is an OLTP database, it is meant to be the backend of business applications. There are some features in RavenDB that are meant for reporting purposes, but that is quite explicitly not our main focus. That is part of why RavenDB has such good integration with the rest of the environment, to give you the ability to use the best tool for the job.

With RavenDB 5.3, we are now allowing you to integrate directly with Power BI, so you can pull data from RavenDB to Power BI, write reports and in general utilize the full power of Power BI with your RavenDB data.

The image on the right, for example, is a report generated inside of Power BI on top of the sample data from RavenDB.

As you can imagine, I’m particularly stoked about this feature. Not only does it make reporting integration with RavenDB a lot simpler, the way that we do it is quite interesting. Instead of diving to the technical details, it would probably be more fun to show you how it works, from the perspective of Power BI.

The first thing we need to do is connect Power BI to RavenDB, using your existing Power BI system, you can simple click on Get Data and select:

image

You’ll then need to provide the connection details:

image

You’ll then be presented with the following dialog:

image

As you can see, we are translating the JSON documents inside of RavenDB to a columnar format for ease of processing inside of Power BI.

You can even take this further and issue RQL queries directly from inside of Power BI and transform the data. That means that you can utilize indexes, map/reduce operations, etc. Take a look:

image

And the result inside of Power BI:

image

As you can imagine, this is going to be a powerful tool in how you can work with your RavenDB data. You can also take this further and integrate that with Power BI on Azure, of course.

The way this works behind the scene is that we can now understand PostgreSQL wire protocol. That means that we can now be accessed from anywhere that can connect to Postgres. While the PostgreSQL protocol implementation is still marked as experimental, we have put the Power BI integration through its paces and we consider that stable enough for regular use.

Happy reporting Smile.

This feature is part of the RavenDB 5.3 release (expected in mid November) and is available in the Enterprise edition of RavenDB.

time to read 2 min | 273 words

imageNext week is Black Friday, which has reached a global phenomenon status. It is a fun day for shoppers, and a nervous wreck for IT admins everywhere. It is not uncommon to see traffic doubles or triples and the actual load (processing more heavyweight requests) can go up an order of magnitude. Preparing for Black Friday can be a harrowing issue since you have a narrow window of opportunity and it is hard to know exactly where the stress points are.

This year, I decided to make your life easier, and RavenDB is offering a Black Friday Surge to all our customers. No, we aren’t offering you 50% off and everything must go. What we do instead is try to be of help.

This Black Friday (and Cyber Monday as well), we are offering all our customers double what they paid for. When running RavenDB on premise, if you purchased a RavenDB license for a 12 cores cluster (running on 3 nodes of 4 cores each), we’ll offer you 30 days of double the core count. In other words, you can scale your system to be twice as powerful, and it won’t cost you a cent.

On the cloud, as well, we will provide users with credits to upgrade their clusters to the next level up (doubling their power) for a full week during the next 30 days. Again, there is no extra cost here.

You can register for the Surge here to request the upgrade and you’ll get twice as much power to handle the increased load.

Enjoy the power up!

time to read 4 min | 649 words

imageA really nice feature that we have in RavenDB 5.3 is support for wire protocol compatibility with PostgreSQL. That opens up RavenDB to the entire PostgreSQL ecosystem. You are now able to connect to a RavenDB instance using the tools such as psql, Npgsql, etc. This feature is both surprisingly simple and incredibly complex at the same time.

The actual wire protocol from Postgres is well documented and pretty clean. Doing a clean room implementation of that is a straightforward process. Adding that to RavenDB, on the other hand, led to a number of interesting challenges. To start with, the protocol assumes, at the wire level, that all the rows that you return for a query have the same structure. This is a very reasonable assumption to make for a relation database protocol, but it doesn’t hold true when you are talking about a schema-less database such as RavenDB.

Then there is the fact that clients will generate all sort of pretty scary queries before you even get to running a user’s query. For example, take a look at how Npgsql is detecting the capabilities of the database that it connects to. Just supporting the wire protocol isn’t sufficient, you also need to support quite a bit of additional behavior.

When we implemented this feature, we decided that we’ll support the wire protocol, so you’ll be able to connect, issue queries and get results. However, the query language itself is going to be RQL. We aren’t attempting to pretend that we are a PostgreSQL instance to the outside world, only implement enough to make integration and compatibility work.

Here is an example of running a query through the Postgres integration.

image

This is an experimental feature, mind you. It is showing a lot of promise, but we want to get some more feedback from our users about which ways we should take it. The feature opens up many doors, but it is also bringing with it a non trivial amount of complexity.

This feature requires that we’ll open up another port to the world, this is something that we require the user to explicitly allow. To enable this feature, you’ll need to set the following options in the settings.json configuration file:

"Integrations.PostgreSQL.Enabled": true
"Features.Availability" : "Experimental"

You can also control which port it will use using the Integrations.PostgreSQL.Port configuration option. We default to 5433 if none is specified.

At the current time, we only allow to issue queries and not modify data using the Postgres integration. This is something that we would very much like more feedback on, what kind of scenarios would you like to have where write scenario is supported? What kind of writes do you expect to have at that point?

Finally, a word about security. The PostgreSQL protocol supports using TLS for encryption. When running in insecure mode, RavenDB will reject SSL/TLS connections from Postgres client. When running in a secured mode (the default), the same server certificate that is used inside of RavenDB will also be used for the Postgres connection. However, while we usually require that the other side also authenticates using a client certificate, in the case of Postgres connection, we run into a problem. There are quite a few scenarios where we found out that while the Postgres protocol supports mutual authentication using client certificates, clients aren’t supporting it.

For that reason, we are allowing user & password authentication (on top of TLS connection, obviously) for the Postgres connections at this time. Note that there is no correlation between the Postgres login and the access to any other RavenDB features (where client certificates is the only option).

This is part of the RavenDB 5.3 release (expected in mid November) and will be available in the Professional and Enterprise editions.

time to read 3 min | 564 words

imageRavenDB tries to be a good neighbor in your systems. RavenDB is typically used in polyglot solutions and we are often brought in to existing ecosystems. One of the things that we do to make it easier to use RavenDB is to have a full suite of built-in tools to make pushing data to other destinations.

For example, you can define an ETL process that will push document changes from RavenDB (potentially transforming & filtering them) to a relational database, another RavenDB instance, a data lake / OLAP system and much more.

In RavenDB 5.3 we have added Elasticsearch as an ETL target for RavenDB. If you are familiar with RavenDB ETL processes, the behavior is pretty much the same as you would expect. You select which collections you want to push to Elasticsearch, you provide a script that filters and transform the data and then you are done. From that point on, it is RavenDB’s responsibility to keep the Elasticsearch target up to date with any changes that are happening inside of RavenDB.

I’ll discuss the exact details on how to make it work shortly, but first I want to talk a bit about the usage scenario for this. Elasticsearch, just like RavenDB, it using Lucene behind the scenes to implement indexes. Unlike RavenDB, however, Elasticsearch is all about… well, searching. In that context, there is a pretty big overlap between RavenDB and Elasticsearch. In fact, one of the primary reasons we see people selecting RavenDB is that they now don’t need to maintain multiple environments (one to store the data and an Elasticsearch cluster for searching on that), RavenDB is able to undertake both needs in a single highly integrated and performant package.

The most common scenario for Elasticsearch ETL is when you already have an existing investment in Elasticsearch. RavenDB will naturally integrate into your environment, without needing to make any significant changes. That can enable you to start running queries, Kibana dashboard, etc on your RavenDB documents.

Here is the transformation script:

image

And the configuration telling RavenDB where to go:

image

You can push multiple collections to multiple Elasticsearch indexes. It is important to note that you must include the RavenDB document Id as a property in the script and also set it in the destination index configuration. If the Elasticsearch index doesn't already exist, RavenDB will create it for you on the fly.

This is… pretty much it. The actual feature is fully fledged, of course. You get monitoring and tracking, it will run in high availability mode and will be assigned an owner node in the cluster, etc. If there is a failure on Elasticsearch, there is no data loss, RavenDB will wait for the target to come back up and push all the data that was changed in the meantime. The ETL process is an online process, which means that you can expect to see changes in RavenDB reflected in Elasticsearch index within a few milliseconds of the transaction commit.

This feature is available in the Professional and Enterprise editions of RavenDB and will be included in the RavenDB 5.3 released, scheduled for mid November.

time to read 2 min | 253 words

About a year and a half ago Kamran wrote an article showing how you can implement rate limits in RavenDB. He used both counters and expiration to handle this scenario. I think that incremental time series make it so much easier to work with, it’s not even funny.

Consider the following code:

Those are ~20 lines of code and that is all you need to handle a sliding window for rate limits. This will work quite nicely even with high levels of concurrency, you can also establish more interesting scenario, such as maximum 50 requests in the past 30 seconds, but also maximum of 200 requests in the past 5 minutes, etc. If you ever tried to make sense of the way Let’s Encrypt is handling rate limits, for example, you can see how you can get some really crazy complexity.

That is now all handled for you. We have the part where we record the number of operations under the limit, and the rest is limited only by your imagination.

For fun, you can now decide what you want to do with this data. If you are only interested in that for rate limiting purposes, you can tell RavenDB to just delete the old data once it is no longer of use using time series retention. You can also just roll them up to a 5 minute boundary and keep that information (for example, monitoring, audits and billing purposes come to mind).

The whole thing is far more cohesive and easier to work with.

time to read 2 min | 285 words

Everyone is on the cloud these days, and one of the things that I keep seeing pushed is the notion of usage based billing. Basically, the idea that you are paying for what you use.

Let’s assume that we are building a software as a service where users can submit an image and you’ll do some computation on that. The actual details aren’t relevant. What matters is that your pricing model is based around how much time processing each image takes and how much memory is used. You are running this on many machines and need to figure out how to do billing at the end of the month. It turns out that this can be quite a challenge. With incremental time series, a lot of the details around that just go away.

Here is how you can implement this:

You count the required memory as well as the actual runtime and record that in an incremental time series. We are also storing the details  in a separate document for that particular run in the same transaction (if the user cares about that level of detail). The interesting bit about how this can be used is that the data is now immediately available for the user to see how much they are going to be billed.

Typically, a lot of time is spent in figuring out how to record those details efficiently and then how to query and aggregate those. We tested time series in RavenDB to billions of data points, and the internal format lends itself very well to aggregated queries.

Now you can take the code above, run it on 100s of machines, and it will all end up giving you the proper result in the end.

FUTURE POSTS

  1. RavenDB Subscriptions & Messaging patterns - 2 days from now
  2. RavenDB custom sorting and when not to use it - 3 days from now
  3. Dealing with complex hierarchies in RavenDB - 4 days from now
  4. .NET Conf 2021 talk - Extreme Architecture Performance - 5 days from now
  5. Desire features in software architecture - 6 days from now

And 7 more posts are pending...

There are posts all the way to Dec 14, 2021

RECENT SERIES

  1. Talk (6):
    23 Apr 2020 - Advanced indexing with RavenDB
  2. re (29):
    23 Jun 2021 - The performance regression odyssey
  3. RavenDB 5.3 New Features (10):
    26 Nov 2021 - Revisions includes
  4. Challenge (61):
    03 Nov 2021 - The code review bug that gives me nightmares–The fix
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats