Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:

oren@ravendb.net

+972 52-548-6969

Posts: 7,252 | Comments: 50,433

Privacy Policy Terms
filter by tags archive
time to read 1 min | 195 words

imageConsider the image on the right, where we have three charges on separate months. This is a time series, showing charges over time. We can very easily issue queries that will give us the results of how much we paid in a time period, but what if we wanted to get the cumulative value. How much have I paid so far? Here is how this should look like:

image

However, that is not something that we provide in RavenDB. Luckily, we do provide a very flexible query engine, so we can make it happen anyway. Here is what the query will look like:

Note that we are using a JavaScript function to process the time series and run the computation that we want, and then we return an array, which is translated to multiple results set per document. Here is the result of this query:

image

time to read 2 min | 300 words

I give a lot of talks about performance and in those talks, I tend to emphasize the architectural impact of your choices. There is a huge tendency to focus on micro optimizations to get the performance you need, even though you can usually get orders of magnitude higher performance by making architectural changes.

Good architecture can still benefit from micro optimizations, however, and it is sometimes really surprising to see by how much. During a routine performance review, we identified a particular scenario as a performance issue. Here is the code in question:

This is being triggered when you are using a parameterized query, like this one:

 

And here is the profiler trace for that:

image

That is kind of ridiculous, to be honest. About 18% of the client side query process went into generating the name of the query. Opps, that is not really something that I expected.

And here is the optimized version:

Basically, we prepare, in advance, the most likely names, so we can find them as cheaply as possible. The result for that particular operation is impressive:

So we almost halved the costs that we have here, but what is more interesting is what happens at higher level of the stack…

image

This is the query processing portion, and you can see that the pretty minimal saving of 187 ms in the AddQueryParameter method is translated to a far greater saving down the line. The overall cost went down by almost 30%.

The probable reason is that we are now allocating slightly less, we saved a few allocations for each query parameter, and that in turn translated to a far better overall performance.

time to read 4 min | 780 words

imageIn architecture (physical building) there is a term called Desire Lanes. The idea is that users will take the path of least resistance, regardless of the intention of the architect. The image on the right is one that I have seen many times, and I got a chuckle out of that each time. That is certainly something that I have seen over and over again.

I had the chance recently to see how the exact same thing happens in two very different software systems. There was a need and the system didn’t allow it. The users found a way.

In the first instance, we are talking about a high security environment. The kind where you leave your phones and smart watches at the door, outside devices are absolutely prohibited. So far, this makes sense and there is a real need for that for their scenario.

The problem is that they also have a high degree of people who are working on that environment on a very transient basis. You may get people that show up for a day or two mostly (meeting, briefings, training) or for a couple of weeks at most. Those people need to be able to do… stuff with computers (take notes, present, plan, etc).

Given the high security environment, creating a user in the system takes a few days at least (involves security briefing, guidance, etc). Note that all involved have the right security clearances, that isn’t the issue. But before you can get a user account, policy dictates that you need to be briefed, login is done via smart cards + password only, etc. You can’t make that work if you have hundreds of people coming and going all the time.

The solution? There are a bunch of smart cards in a drawer belonging to former employees whose accounts were purposefully not deactivated. You get handed the card + password and can use the account for basic needs.

I assume that those accounts are locked, but I didn’t bother to verify that. It wouldn’t surprise me if they still had all their permissions and privileges.

From an IT security standpoint, I am horrified. That is a Bad Idea, but it is a solution to the issue at hand, providing computer access for short terms visitors without having to go through all the hoops the security policy dictates.

This is sadly a very widespread tactic in that organization, I have seen this in multiple branches in separate locations.

In the second scenario, there is a system to reserve appointments with doctors. The system has an app, where users can register for their appointments themselves. There is also the administration team that may also reserve appointments for patients. The system allows a doctor to define their hours of operations and then (as far as the system is concerned) it is first come & first served basis. The administration team, on the other hand, has to deal with a more complex situation.

For example, a common issue that I run into is that you can only set an appointment if you are registered in the system. What would you do with first time visitors? They are routinely setting things up through the administration team, but while they can reserve an appointment, they have to put someone that is registered in the system.

The solution for that is to use other people, typically the administration team will use their own accounts and set the appointment for themselves, to reserve the spot for the new patient. That can lead to some issues, for example, if the doctor has to cancel, the system will send notices to the scheduled patients. But the scheduled patient and the real patient are distinct. It also means that from a medical file perspective, certain people are “ditching” a lot of appointments.

In both cases, we can see that there is a need to do something that the system doesn’t allow (or actively trying to prevent). The end result is a solution, a sub optimal one, for sure, but something that works.

One of the key aspects for building proper systems for the long term is to listen and implement proper solutions for those sort of issues. In many cases, they are of pivotal importance for the end user, note that this is very much distinct from the customer. The customer is the one who pays, the end user is the one who is using the system. There is often a major disconnect between the two.

This is where you get Desire Features and workaround that become Official, to the point where some of those solutions are literally in the employee handbook.

time to read 2 min | 260 words

For many business domains, it is common to need to deal with hierarchies or graphs. The organization chart is one such common scenario, as is the family tree. It is common to want to use graph queries to deal with such scenarios, but I find that it is usually much easier to explicitly build your own queries..

Consider the following query, which will give me the entire hierarchy for a particular employee:

I can define my own logic for traversing from the document to the related documents, and I can do whatever I want there. You can also see that I’m including the related documents, here is how this looks like when I execute the query:

image

A single query gives me all the details I need to show the user with one roundtrip to the server.

Let’s go with a more complex example. The above scenario had a single path to follow, which is trivial. What happens if I have a more complex system, such as a family tree? I took the Games of Thrones data (easiest to work with for this demo) and threw that into RavenDB, and then executed the following query:

And that gives me the following output:

image

This is a pretty fun technique to explore, because you can run any arbitrary logic you need, and expressing things in an imperative manner is typically much more straightforward.

time to read 2 min | 303 words

RavenDB has the notion of Custom Sorters, basically, we allow you to inject your own logic into the sorting process. That allows you to run any complex logic you have around sorting.  There are rarely good reasons to want to use that. A good use case for that is when you need to sort by an external value that mutates outside of your control. Let’s say that you have invoices in multiple currencies. You want to sort them by their value in USD. The catch, you need them sorted on the current exchange rate. For that reason, you can use the custom sorter that would use the current value of the currency as the sorting mechanism.

I should point out that from a business perspective, you’ll typically want to use the value that you had for the order at the time the order was made, but that is not related the the custom sorting feature.

Let’s take another example, however. Consider the following Enum:

We want to sort by the education level of our candidates, but by default, we’ll be sorting using the textual value of the field. That isn’t what we want. We can define a custom sorter for that, but there is a far better option, just tell us what the order should be in the index.

Here is a good example:

What we are doing here is simple, we translate the textual value to a numeric one. When we query the index, we can filter by the textual value and sort by the sort value, giving us what we want. This is far simpler and more robust. If you need to add additional values down the line, it is obvious where they need to go. A custom sorter, on the other hand, is far more capable, but also more complex to operate.

time to read 2 min | 306 words

RavenDB is a database, not a queue or a service bus. That said, you can make use of RavenDB subscriptions to get a very similar behavior to a service bus. Let’s see how much effort it will take us to implement backend processing using RavenDB only.

We assume that we have commands or messages, that are written to the Commands collection and are handled via a subscription (which may have multiple concurrent workers). In terms of your messaging models, we have:

The CommandBase we have here defines the following infrastructure properties:

  • Status – enum [Initial, Processing, Failed, Completed] – default value is Initial
  • RetriesCount – int – default value is 3
  • Error – string – null by default

We can now define our subscription using the following query:

from Commands as c
where c.RetriesCount > 0  and c.Status != 'Completed' and c.’@metadata’.’@refresh’ == null

This query is pretty simple, but it allows me to get all the documents that haven’t exceeded their retry count. The @refresh option allows me to register a command to be executed at a later point in time. See the documentation here, this is a feature that exists specifically to allow you to schedule commands with subscriptions.

In my subscription workers, I can now execute:

The code above is sufficient to get most of the way toward a robust message handling system.

I can easily see what messages are being processing, I can see how long they take, etc. I can see what failed and why. And I can see the history of commands.

That handles scenarios such as error handling and retries, introspection on the state of the system and you can derive from here all the relevant numbers on throughput, capacity, etc.

It isn’t a complete solution, but for very little code, you can take this quite a long way.

time to read 2 min | 313 words

Enabling RavenDB’s revisions allows you to ask RavenDB to keep immutable copies of a document. We originally envisioned this feature as a way to have easy audit trails and a time travel feature. Revisions were meant to be something that you’ll typically access as the administrator, not something that we expected to be used in normal course of events.

Usage in the field showed that users often want to make revisions a core part of the domain model. We have a user that uses revisions to mark the Approved (and thus, locked) version of a Plan document, for example. Another example is a payroll processing system where the Contract for a particular employee isn’t pointing to a document, but to a specific revision of that contract. Modifications to the contract have no impact on the employee unless they are explicitly moved to a new version of the contract.

Seeing all those use cases popping up for revisions, we added more features around revisions to make it easier to work with them (such as allowing to explicitly create a revision on command or enforcing revisions policies after the fact).

In RavenDB 5.3 we have added the ability to include revisions as part of the query, so you get the same benefit of reduction in the number of remote calls as you get when working with document references. Let’s say that I want to calculate payroll for a set of employees, here is how I can do that (the Contract property contains the change vector of the specific version of the contract):

image

In terms of API, this is how you use it:

This smooths out a scenario where you had to deal with making multiple remote calls into a single one for the whole process. Just the kind of improvement that I like to see.

time to read 1 min | 150 words

JSON Patch is a feature that allows the frontend to send a set of changes on documents. If you are working with complex documents, that can result in a significant reduction in bandwidth. There are many scenarios where the client can modify a document on the browser, then produce the JSON Patch to make the server match the changes.

In RavenDB 5.3, we added direct support for implementing JSON Patch inside of RavenDB. The frontend code can forward the patch operations directly to RavenDB, where they will be executed by the database. Concurrent patches on the same document aren’t going to contend with one another and will be processed correctly. For example, in this post I’m using patch scripts to modify a document. I can do the same using JSON patch as well.

Here is how you can use the new ability:

Small improvement, but can make some scenarios much easier.

time to read 3 min | 424 words

I like to think about myself as a database guy. My go to joke about building user interfaces is that a <table> is all I need for layout (it’s not a joke). About a decade ago I just gave up on trying to follow what is going on in the frontend land and accepted that I’ll reside in the backend from here on after.

Being ignorant of the ways you’ll write a modern frontend doesn’t affect the fact that I like to use a good user interface. I have seriously mixed feelings about the importance of RavenDB Studio to the project. On the one hand, I care that it is easy to use, obvious and functional. I love that it is beautiful and will generally make your life easier. And at the same time, I abhor the fact that it has such an impact on people’s decisions. I mean, the backend of RavenDB is absolutely beautiful, from a technical perspective. But everyone always talk about the studio.

Leaving aside my mini rant, we spend quite a lot of time and effort on the studio and the User Experience in general. This release is not an exception and we have a couple of major new updates to the studio.

One of the most common things you’ll do in the studio is run queries. In this release we have done a complete revamp of the automatic code completion for the client-side RQL queries written in the studio.
The new code assistance is available when writing any query in the Query view, Patch view, and in the Subscription Query. That was actually quite interesting, from a computer science perspective. We have formal grammar for RQL now, for example, which means that we can provide much better experience for query editing. For example, take a look:

image

Full code completion assistance and better error handling directly at the studio makes it easier to work with RavenDB for both developers and operations.

The second feature is the Identities page:

image

Identities has been a feature in RavenDB for a long time, and somehow they have never been front and center. Maybe the discoverability of the feature suffered? You can now create, edit and modify the identities directly in the studio, not just through the API.

image

FUTURE POSTS

  1. Feature Design: ETL for Queues in RavenDB - 18 hours from now
  2. re: Why IndexedDB is slow and what to use instead - about one day from now
  3. Implementing a file pager in Zig: What do we need? - 3 days from now
  4. Production postmortem: The memory leak that only happened on Linux - 6 days from now
  5. Talk: Scalable architecture from the ground up - 7 days from now

And 6 more posts are pending...

There are posts all the way to Dec 22, 2021

RECENT SERIES

  1. Challenge (63):
    03 Nov 2021 - The code review bug that gives me nightmares–The fix
  2. Talk (6):
    23 Apr 2020 - Advanced indexing with RavenDB
  3. Production postmortem (32):
    17 Sep 2021 - The Guinness record for page faults & high CPU
  4. re (29):
    23 Jun 2021 - The performance regression odyssey
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats