Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:

oren@ravendb.net

+972 52-548-6969

Posts: 7,252 | Comments: 50,426

Privacy Policy Terms
filter by tags archive
time to read 4 min | 780 words

imageIn architecture (physical building) there is a term called Desire Lanes. The idea is that users will take the path of least resistance, regardless of the intention of the architect. The image on the right is one that I have seen many times, and I got a chuckle out of that each time. That is certainly something that I have seen over and over again.

I had the chance recently to see how the exact same thing happens in two very different software systems. There was a need and the system didn’t allow it. The users found a way.

In the first instance, we are talking about a high security environment. The kind where you leave your phones and smart watches at the door, outside devices are absolutely prohibited. So far, this makes sense and there is a real need for that for their scenario.

The problem is that they also have a high degree of people who are working on that environment on a very transient basis. You may get people that show up for a day or two mostly (meeting, briefings, training) or for a couple of weeks at most. Those people need to be able to do… stuff with computers (take notes, present, plan, etc).

Given the high security environment, creating a user in the system takes a few days at least (involves security briefing, guidance, etc). Note that all involved have the right security clearances, that isn’t the issue. But before you can get a user account, policy dictates that you need to be briefed, login is done via smart cards + password only, etc. You can’t make that work if you have hundreds of people coming and going all the time.

The solution? There are a bunch of smart cards in a drawer belonging to former employees whose accounts were purposefully not deactivated. You get handed the card + password and can use the account for basic needs.

I assume that those accounts are locked, but I didn’t bother to verify that. It wouldn’t surprise me if they still had all their permissions and privileges.

From an IT security standpoint, I am horrified. That is a Bad Idea, but it is a solution to the issue at hand, providing computer access for short terms visitors without having to go through all the hoops the security policy dictates.

This is sadly a very widespread tactic in that organization, I have seen this in multiple branches in separate locations.

In the second scenario, there is a system to reserve appointments with doctors. The system has an app, where users can register for their appointments themselves. There is also the administration team that may also reserve appointments for patients. The system allows a doctor to define their hours of operations and then (as far as the system is concerned) it is first come & first served basis. The administration team, on the other hand, has to deal with a more complex situation.

For example, a common issue that I run into is that you can only set an appointment if you are registered in the system. What would you do with first time visitors? They are routinely setting things up through the administration team, but while they can reserve an appointment, they have to put someone that is registered in the system.

The solution for that is to use other people, typically the administration team will use their own accounts and set the appointment for themselves, to reserve the spot for the new patient. That can lead to some issues, for example, if the doctor has to cancel, the system will send notices to the scheduled patients. But the scheduled patient and the real patient are distinct. It also means that from a medical file perspective, certain people are “ditching” a lot of appointments.

In both cases, we can see that there is a need to do something that the system doesn’t allow (or actively trying to prevent). The end result is a solution, a sub optimal one, for sure, but something that works.

One of the key aspects for building proper systems for the long term is to listen and implement proper solutions for those sort of issues. In many cases, they are of pivotal importance for the end user, note that this is very much distinct from the customer. The customer is the one who pays, the end user is the one who is using the system. There is often a major disconnect between the two.

This is where you get Desire Features and workaround that become Official, to the point where some of those solutions are literally in the employee handbook.

time to read 2 min | 260 words

For many business domains, it is common to need to deal with hierarchies or graphs. The organization chart is one such common scenario, as is the family tree. It is common to want to use graph queries to deal with such scenarios, but I find that it is usually much easier to explicitly build your own queries..

Consider the following query, which will give me the entire hierarchy for a particular employee:

I can define my own logic for traversing from the document to the related documents, and I can do whatever I want there. You can also see that I’m including the related documents, here is how this looks like when I execute the query:

image

A single query gives me all the details I need to show the user with one roundtrip to the server.

Let’s go with a more complex example. The above scenario had a single path to follow, which is trivial. What happens if I have a more complex system, such as a family tree? I took the Games of Thrones data (easiest to work with for this demo) and threw that into RavenDB, and then executed the following query:

And that gives me the following output:

image

This is a pretty fun technique to explore, because you can run any arbitrary logic you need, and expressing things in an imperative manner is typically much more straightforward.

time to read 2 min | 303 words

RavenDB has the notion of Custom Sorters, basically, we allow you to inject your own logic into the sorting process. That allows you to run any complex logic you have around sorting.  There are rarely good reasons to want to use that. A good use case for that is when you need to sort by an external value that mutates outside of your control. Let’s say that you have invoices in multiple currencies. You want to sort them by their value in USD. The catch, you need them sorted on the current exchange rate. For that reason, you can use the custom sorter that would use the current value of the currency as the sorting mechanism.

I should point out that from a business perspective, you’ll typically want to use the value that you had for the order at the time the order was made, but that is not related the the custom sorting feature.

Let’s take another example, however. Consider the following Enum:

We want to sort by the education level of our candidates, but by default, we’ll be sorting using the textual value of the field. That isn’t what we want. We can define a custom sorter for that, but there is a far better option, just tell us what the order should be in the index.

Here is a good example:

What we are doing here is simple, we translate the textual value to a numeric one. When we query the index, we can filter by the textual value and sort by the sort value, giving us what we want. This is far simpler and more robust. If you need to add additional values down the line, it is obvious where they need to go. A custom sorter, on the other hand, is far more capable, but also more complex to operate.

time to read 2 min | 306 words

RavenDB is a database, not a queue or a service bus. That said, you can make use of RavenDB subscriptions to get a very similar behavior to a service bus. Let’s see how much effort it will take us to implement backend processing using RavenDB only.

We assume that we have commands or messages, that are written to the Commands collection and are handled via a subscription (which may have multiple concurrent workers). In terms of your messaging models, we have:

The CommandBase we have here defines the following infrastructure properties:

  • Status – enum [Initial, Processing, Failed, Completed] – default value is Initial
  • RetriesCount – int – default value is 3
  • Error – string – null by default

We can now define our subscription using the following query:

from Commands as c
where c.RetriesCount > 0  and c.Status != 'Completed' and c.’@metadata’.’@refresh’ == null

This query is pretty simple, but it allows me to get all the documents that haven’t exceeded their retry count. The @refresh option allows me to register a command to be executed at a later point in time. See the documentation here, this is a feature that exists specifically to allow you to schedule commands with subscriptions.

In my subscription workers, I can now execute:

The code above is sufficient to get most of the way toward a robust message handling system.

I can easily see what messages are being processing, I can see how long they take, etc. I can see what failed and why. And I can see the history of commands.

That handles scenarios such as error handling and retries, introspection on the state of the system and you can derive from here all the relevant numbers on throughput, capacity, etc.

It isn’t a complete solution, but for very little code, you can take this quite a long way.

time to read 2 min | 253 words

About a year and a half ago Kamran wrote an article showing how you can implement rate limits in RavenDB. He used both counters and expiration to handle this scenario. I think that incremental time series make it so much easier to work with, it’s not even funny.

Consider the following code:

Those are ~20 lines of code and that is all you need to handle a sliding window for rate limits. This will work quite nicely even with high levels of concurrency, you can also establish more interesting scenario, such as maximum 50 requests in the past 30 seconds, but also maximum of 200 requests in the past 5 minutes, etc. If you ever tried to make sense of the way Let’s Encrypt is handling rate limits, for example, you can see how you can get some really crazy complexity.

That is now all handled for you. We have the part where we record the number of operations under the limit, and the rest is limited only by your imagination.

For fun, you can now decide what you want to do with this data. If you are only interested in that for rate limiting purposes, you can tell RavenDB to just delete the old data once it is no longer of use using time series retention. You can also just roll them up to a 5 minute boundary and keep that information (for example, monitoring, audits and billing purposes come to mind).

The whole thing is far more cohesive and easier to work with.

time to read 3 min | 587 words

When you use a cache, you need to take into account several factors about the cache. There are several workload patterns that can cause the cache to turn into a liability, instead of an asset. One of the most common scenarios where you can pay heavily for the cache, but not benefit much, is when you have a sequential access pattern that exceed the size of the cache.

Consider the following scenario:

In this case, the size is set to 100, but the keys are sequential in the range of 0 .. 127. We are basically guaranteed to never have a cache hit. What is the impact of such a cache, however?

Well, it will keep the reference alive for longer, so they will end up in the Gen2. On eviction, they will take longer to be discarded. In other words, adding a cache here will increase the amount of memory that is being used, have higher CPU utilization (the GC has to do more work) and won’t add any performance benefit at all. Removing the cache, on the other hand, will reduce both memory utilization and CPU costs.

This can be completely unintuitive at first glance, but it is a real scenario, and sadly something that we had experienced many times in RavenDB 3.x editions. In fact, a lot of the design of RavenDB 4.x was about fixing those kinds of issues.

Whenever you design a cache, you should consider what sort of adversity you have to face. Considering your users and adversaries, intentionally trying to break your software, is a good mindset to have. You get to avoid many pitfalls this way.

There are many other caching anti patterns. For example, if you are using a distributed cache, the pattern of accesses to the cache may be more expensive than reading from the source. You have many (fast) queries to answer a value, instead of one (somewhat slower) remote call. The network cost is typically huge, but discounted (see: Fallacies of Distributed Computing).

But for in memory cache, it is easy to forget that a cache that is overloaded is just a memory hog, not providing very good details at all. In the previous posts, I discussed how I should use a buffer pool in conjunction with the cache. That is done because of this particular scenario, if the cache is overloaded, and we discard values, we want to at least avoid doing additional allocations.

In many ways, a cache is a really complex piece of software. There has been a lot of research into it. Here are another non initiative result. Instead of using the least recently used (or least frequently used), select a value at random and evict it. Your performance is going to be faster.

Why is that? Look at the code above, let’s assume that I’m evicting a random value in the 25% least frequently used items. The fact that I’m doing that randomly means that there is higher likelihood that some values will remain in the cache, even after they “should” have expired. And by the time I come back to them, they would be useful in the cache, instead of predictably evicted.

In many databases, the cache management takes a huge part of the complexity. You usually have multiple levels of caches, and policies that move them between one another. I really liked this post, discussing the Postgres algorithm in great details. It also cover some aspects of nearly hostile behavior that the cache has to guard against, to avoid pathological performance drops.

time to read 1 min | 119 words

Another code review comment, this time this is an error message being raised:

image

The comment I made here is that this error message lacks a very important detail. How do I fix this? In this case, the maximum number of connections is controlled by a configuration setting, so telling that to the user as well as what is the actual configuration entry to use is paramount.

If we give them all the information, we save the support call that would follow. If the user don’t read the error message, the support engineer likely would, and be able to close the ticket within moment.

time to read 1 min | 162 words

I just went over the following pull request, where I found this nugget:

image

I have an admittedly very firm views on the subject of error handling. The difference between good error handling and the merely mediocre can be ten times more lines of code, but also hundred times less troubleshooting in production.

And a flat out rule that I have is that whenever you have a usage of the Exception.Message property, that means that you are discarding valuable information aside. This is a pretty horrible thing to do.

You should never use the Message property, always use the ToString(), so you’ll get the full details. Yes, that is relevant even if you are showing to the user. Because if you are showing an exception message to the user, you have let it bubbled up all the way, so you might as well give full details.

time to read 8 min | 1413 words

I’ll start with saying that this is not something that is planned in any capacity, I run into this topic recently and decided to dig a little deeper. This post is mostly about results of my research.

If you run a file sharing system, you are going to run into a problem very early on. Quite a lot of the files that people are storing are shared, that is a good thing. Instead of storing the same file multiple times, you can store it once, and just keep a reference counter. That is how RavenDB internally deals with attachments, for example. Two documents that have the same attachment (content, not the file name, obviously) will only have a single entry inside of the RavenDB database.

When you run a file sharing system, one of the features you really want to offer is the ability to save space in this manner. Another one is not being able to read the users’ files. For many reasons, that is a highly desirable property. For the users, because they can be assured that you aren’t reading their private files. For the file sharing system, because it simplify operations significantly (if you can’t look at the files’ contents,  there is a lot that you don’t need to worry about).

In other words, we want to store the users’ file, but we want to do that in an encrypted manner. It may sound surprising that the file sharing system doesn’t want to know what it is storing, but it actually does simplify things. For example, if you can look into the data, you may be asked (or compelled) to do so. I’m not talking about something like a government agency doing that, but even feature requests such as “do virus scan on my files”, for example. If you literally cannot do that, and it is something that you present as an advantage to the user, that is much nicer to have.

The problem is that if you encrypt the files, you cannot know if they are duplicated. And then you cannot use the very important storage optimization technique of de-duplication. What can you do then?

This is where convergent encryption comes into play. The idea is that we’ll use an encryption system that will give us the same output for the same input even when using different keys. To start with, that sounds like a tall order, no?

But it turns out that it is quite easy to do. Consider the following symmetric key operations:

One of the key aspects of modern cryptography is the use of nonce, that ensures that for the same message and key, we’ll always get a different ciphertext. In this case, we need to go the other way around. To ensure that for different keys, the same content will give us the same output. Here is how we can utilize the above primitives for this purpose. Here is what this looks like:

In other words, we have a two step process. First, we compute the cryptographic hash of the message, and use that as the key to encrypt the data. We use a static nonce for this part, more on that later. We then take the hash of the file and encrypt that normally, with the secret key of the user and a random nonce. We can then push both the ciphertext   and the nonce + encrypted key to the file sharing service. Two users, using different keys, will always generate the same output for the same input in this model. How is that?

Both users will compute the same cryptographic hash for the same file, of course. Using a static nonce completes the cycle and ensures that we’re basically running the same operation. We can then push both the encrypted file and our encrypted key for that to the file sharing system. The file sharing system can then de-duplicate the encrypted blob directly. Since this is the identical operation, we can safely do that. We do need to keep, for each user, the key to open that file, encrypted with the user’s key. But that is a very small value, so likely not an issue.

Now, what about this static nonce? The whole point of a nonce is that it is a value that you use once. How can we use a static value here safely? The problem that nonce is meant to solve is that with most ciphers, if you XOR the output of two cipher texts, you’ll get the difference between them. If they were encrypted using the same key and nonce, you’ll get the result of XOR between their plain texts. That can have catastrophic impact on the security of the system. To get anywhere here, you need to encrypt two different messages with the same key and nonce.

In this case, however, that cannot happen. Since we use cryptographic hash of the content as the key, we know that any change in the message will ensure that we have a different key. That means that we never reuse the key, so there is no real point in using a nonce at all. Given that the cryptographic operation requires it, we can just pass a zeroed nonce and not think about it further.

This is a very attractive proposition, since it can amounts to massive space savings. File sharing isn’t the only scenario where this is attractive. Consider the case of a messaging application, where you want to forward messages from one user to another. Memes and other viral details are a common scenario here. You can avoid having to re-upload the file many times, because even in an end to end encryption model, we can still avoid sharing the file contents with the storage host.

However, this lead to one of the serious issues that we have to cover for convergent encryption. For the same input, you’ll get the same output. That means that if an adversary know the source file, it can tell if a user has that file. In the context of a messaging application, it can spell trouble. Consider the following image, which is banned by the Big Meat industry:

image

Even with end to end encryption, if you use convergent encryption for media files, you can tell that a particular piece of content is accessed by a user. If this is a unique file, the server can’t really tell what is inside it. However, if this is a file that has a known source, we can detect that the user is looking at a picture of salads and immediately snitch to Big Meat.

This is called configuration of file attack, and it is the most obvious problem for this scenario. The other issue is that you using convergent encryption, you may allow an adversary to guess about values in the face on known structure.

Let’s consider the following scenario, I have a system service where users upload their data using convergent encryption, given that many users may share the same file, we allow any user to download a file using:

GET /file?id=71e12496e9efe0c7e31332533cb53abf1a3677f2a802ef0c555b08ba6b8a0f9f

Now, let’s assume that I know what typical files are stored in the service. For example, something like this:

File:W4 completo ejemplo.jpg

Looking at this form, there are quite a few variables that you can plug here, right? However, we can generate options for all of those quite easily, and encryption is cheap. So we can speculate on the possible values. We can then run the convergent encryption on each possibility, then fetch that from the server. If we have a match, we figured out the remaining information.

Consider this another aspect of trying to do password cracking, and using a set of algorithms that are quite explicitly meant to be highly efficient, so they lend themselves to offline work.

That is enough on the topic, I believe. I don’t have any plans of doing something with that, but it was interesting to figure out. I actually started this looking at this feature in WhatsApp:

image

It seams that this is a client side enforced policy, rather than something that is handled via the protocol itself. I initially thought that this is done via convergent encryption, but it looks like just a counter in the message, and then you the client side shows the warning as well as applies limits to it.

time to read 1 min | 187 words

Following my previous post, which mentioned that you can save significantly on disk space if you store a plain text attachment using gzip, we go a feature request:

Perhaps in future attachments could have built-in compression as well?

The answer to that is no, but I thought that it is worth a post to explain why not.

Let’s consider the typical types of attachments that you’ll store in RavenDB. Based on experience, we usually see:

  • PDF files
  • Word / Excel / Power Point
  • Images (JPEG, PNG, GIF, etc)
  • Videoes
  • Designs (floor plans, CAD / DWG, etc)
  • Text files

Aside from the text files, pretty much all the data you’ll store as an attachment is already compressed. In fact, you’ll be hard pressed today to find any file format that does not already have built-in compression.

Compressing already compressed data is… suboptimal. I will not usually lead to significant space savings and can actually make the file size larger. It also burns CPU cycles unnecessarily.

It is better to shift the responsibility to the users in this case, since they have a lot more information about what they actually put into RavenDB and won’t have to guess.

FUTURE POSTS

  1. An optimization story:–27% runtime costs for 8 lines of code - 3 days from now
  2. Cumulative computation with RavenDB queries - 4 days from now
  3. Feature Design: ETL for Queues in RavenDB - 5 days from now
  4. re: Why IndexedDB is slow and what to use instead - 6 days from now
  5. Implementing a file pager in Zig: What do we need? - 7 days from now

And 8 more posts are pending...

There are posts all the way to Dec 22, 2021

RECENT SERIES

  1. Challenge (63):
    03 Nov 2021 - The code review bug that gives me nightmares–The fix
  2. Talk (6):
    23 Apr 2020 - Advanced indexing with RavenDB
  3. Production postmortem (32):
    17 Sep 2021 - The Guinness record for page faults & high CPU
  4. re (29):
    23 Jun 2021 - The performance regression odyssey
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats