I spoke for over an hour about how you can build high-performance systems and how we utilize these techniques inside of RavenDB.
You can listen to me talk with Dan in the Unhandled Exception podcast, where we dug deep into the internals of database engines.
As usual, I would love your feedback.
I had a great time talking with Scott Hanselman about how we achieve great performance for RavenDB with .NET.
You can listen to the podcast here, as usual, I would love your feedback.
In this episode, we talk to Oren Eini from RavenDB. RavenDB is a NoSQL document database that offers high performance, scalability, and security. Oren shares his insights on why performance is not just a feature, but a service that developers and customers expect and demand. He also explains how RavenDB achieves fast and reliable data access, how it handles complex queries and distributed transactions, and how it leverages the cloud to optimize resource utilization and cost efficiency!
In my last post we implemented variable-sized encoding to be able to pack even more data into the page. We were able to achieve 40% better density because of that. This is pretty awesome, but we would still like to do better. There are two disadvantages for variable size integers:
- They may take more space than the actual raw numbers.
- The number of branches is high, and non-predictable.
Given that we need to encode the key and value together, let’s see if we can do better. We know that both the key and the value are 8 bytes long. Using little-endian systems, we can consider the number as a byte array.
Consider this number: 139,713,513,353 which is composed of the following bytes: [137, 7, 147, 135, 32, 0, 0, 0]. This is how it looks in memory. This means, that we only need the first 5 bytes, not the last 3 zero ones.
It turns out that there is a very simple way to compute the number of used bytes, like so:
This translates into the following assembly:
Which is about as tight as you can want it to be.
Of course, there is a problem. In order to read the value back, we need to store the number of bytes we used somewhere. For variable-sized integers, they use the top bit until they run out. But we cannot do that here.
Remember however, that we encode two numbers here. And the length of the number is 8 bytes. In binary, that means that we need 4 bits to encode the length of each number. This means that if we’ll take an additional byte, we can fit the length of both numbers into a single byte.
The length of the key and the value would each fit on a nibble inside that byte. Here is what the encoding step looks like now:
And in assembly:
Note that there are no branches at all here. Which I’m really stoked about. As for decoding, we just have to go the other way around:
No branches, and really predictable code.
That is all great, but what about the sizes? We are always taking 4 additional bits per number. So it is actually a single additional byte for each entry we encode. By using varint, the memory we encode numbers that are beyond the 2GB range, we’re already winning. Encoding (3,221,241,856), for example, will cost us 5 bytes (since we limit the range of each byte to 7 bits). The key advantage in our case is that if we have any case where either key or value needs to take an additional byte, we are at parity with the nibble method. If both of them need that, we are winning, since the nibble method will use a single additional byte and the variable size integer will take two (one for each number).
Now that we understand encoding and decoding, the rest of the code is basically the same. We just changed the internal format of the entry, nothing about the rest of the code changes.
And the results?
For the realistic dataset, we can fit 759 entries versus 712 for the variable integer model.
For the full dataset, we can fit 752 entries versus 710 for the variable integer model.
That is a 7% improvement in size, but it also comes with a really important benefit. Fewer branches.
This is the sort of code that runs billions of times a second. Reducing its latency has a profound impact on overall performance. One of the things that we pay attention to in high-performance code is the number of branches, because we are using super scalar CPUs, multiple instructions may execute in parallel at the chip level. A branch may cause us to stall (we have to wait until the result is known before we can execute the next instruction), so the processor will try to predict what the result of the branch would be. If this is a highly predictable branch (an error code that is almost never taken, for example), there is very little cost to that.
The variable integer code, on the other hand, is nothing but branches, and as far as the CPU is concerned, there is no way to actually predict what the result will be, so it has to wait. Branchless or well-predicted code is a key aspect of high-performance code. And this approach can have a big impact.
As a reminder, we started at 511 items, and we are now at 759. The question is, can we do more?
I’ll talk about it in the next post…
Trevor Hunter from Kobo Rakuten is going to be speaking about Kobo’s usage of RavenDB in a webinar next Wednesday.
When I started at Kobo, we needed to look beyond the relational and into document databases. Our initial technology choice didn't work out for us in terms of reliability, performance, or flexibility, so we looked for something new and set on a journey of discovery, exploratory testing, and having fun pushing contender technologies to their limits (and breaking them!). In this talk, you'll hear about our challenges, how we evaluated the options, and our experience since widely adopting RavenDB. You'll learn about how we use it, areas we're still a bit wary of, and features we're eager to make more use of. I'll also dive into the key aspects of development - from how it affects our unit testing to the need for a "modern DBA" role on a development team.
About the speaker: Trevor Hunter: "I am a leader and coach with a knack for technology. I’m a Chief Technology Officer, a mountain biker, a husband, and a Dad. My curiosity to understand how things work and my desire to use that understanding to help others are the things I hope my kids inherit from me. I am currently the Chief Technology Officer of Rakuten Kobo. Here I lead the Research & Development organization where our mission is to deliver the best devices and the best services for our readers. We innovate, create partnerships, and deliver software, hardware, and services to millions of users worldwide."
You can register to the webinar here.
It’s very common to model your backend API as a set of endpoints that mirror your internal data model. For example, consider a blog engine, which may have:
GET /users/{id}
: retrieves information about a specific user, where{id}
is the ID of the userGET /users/{id}/posts
: retrieves a list of all posts made by a specific user, where{id}
is the ID of the userPOST /users/{id}/posts
: creates a new post for a specific user, where{id}
is the ID of the userGET /posts/{id}/comments
: retrieves a list of all comments for a specific post, where{id}
is the ID of the postPOST /posts/{id}/comments
: creates a new comment for a specific post, where{id}
is the ID of the post
This mirrors the internal structure pretty closely, and it is very likely that you’ll get to an API similar to this if you’ll start writing a blog backend. This represents the usual set of operations very clearly and easily.
The problem is that the blog example is so attractive because it is inherently limited. There isn’t really that much going on in a blog from a data modeling perspective. Let’s consider a restaurant and how its API would look like:
GET /menu
: Retrieves the restaurant's menuPOST /orders
: Creates a new orderPOST /orders/{order_id}/items
: Adds items to an existing orderPOST /payments
: Allows the customer to pay their bill using a credit card
This looks okay, right?
We sit at a table, grab the menu and start ordering. From REST perspective, we need to take into account that multiple users may add items to the same order concurrently.
That matters, because we may have bundles to take into account. John ordered the salad & juice and Jane the omelet, and Derek just got coffee. But coffee is already included in Jane’s order, so no separate charge for that. Here is what this will look like:
┌────┐┌────┐┌─────┐┌──────────────────────┐ │John││Jane││Derek││POST /orders/234/items│ └─┬──┘└─┬──┘└──┬──┘└─────┬────────────────┘ │ │ │ │ │ Salad & Juice │ │─────────────────────>│ │ │ │ │ │ │ Omelet │ │ │───────────────>│ │ │ │ │ │ │ │ Coffee │ │ │ │────────>│
The actual record we have in the end, on the other hand, looks like:
- Salad & Juice
- Omelet & Coffee
In this case, we want the concurrent nature of separate requests, since each user will be ordering at the same time, but the end result should be the final tally, not just an aggregation of the separate totals.
In the same sense, how would we handle payments? Can we do that in the same manner?
┌────┐┌────┐┌─────┐┌──────────────────┐ │John││Jane││Derek││POST /payments/234│ └─┬──┘└─┬──┘└──┬──┘└────────┬─────────┘ │ │ │ │ │ │ $10 │ │────────────────────────>│ │ │ │ │ │ │ │ $10 │ │ │──────────────────>│ │ │ │ │ │ │ │ $10 │ │ │ │───────────>│
In this case, however, we are in a very different state. What happens in this scenario if one of those charges were declined? What happens if they put too much. What happens if there is a concurrent request to add an item to the order while the payment is underway?
When you have separate operations, you have to somehow manage all of that. Maybe a distributed transaction coordinator or by trusting the operator or by dumb luck, for a while. But this is actually an incredibly complex topic. And a lot of that isn’t inherent to the problem itself, but instead about how we modeled the interaction with the server.
Here is the life cycle of an order:
POST /orders
: Creates a new order – returns the new order id** POST /orders/{order_id}/items
: Adds / removes items to an existing order** POST /orders/{order_id}/submit
: Submits all pending order items to the kitchenPOST /orders/{order_id}/bill
: Close the order, compute the total chargePOST /payments/{order_id}
: Handle the actual payment (or payments)
I have marked with ** the two endpoints that may be called multiple times. Everything else can only be called once.
Consider the transactional behavior around this sort of interaction. Adding / removing items from the order can be done concurrently. But submitting the pending orders to the kitchen is a boundary, a concurrent item addition would either be included (if it happened before the submission) or not (and then it will just be added to the pending items).
We are also not going to make any determination on the offers / options that were selected by the diners until they actually move to the payment portion. Even the payment itself is handled via two interactions. First, we ask to get the bill for the order. This is the point when we’ll compute orders, and figure out what bundles, discounts, etc we have. The result of that call is the final tally. Second, we have the call to actually handle the payment. Note that this is one call, and the idea is that the content of this is going to be something like the following:
{ "order_id": "789", "total": 30.0, "payments": [ { "amount": 15.0, "payment_method": "credit_card", "card_number": "****-****-****-3456", "expiration_date": "12/22", "cvv": "123" }, { "amount": 10.0, "payment_method": "cash" }, { "amount": 5.0, "payment_method": "credit_card", "card_number": "****-****-****-5678", "expiration_date": "12/23", "cvv": "456" } ] }
The idea is that by submitting it all at once, we are removing a lot of complexity from the backend. We don’t need to worry about complex interactions, race conditions, etc. We can deal with just the issue of handling the payment, which is complicated enough on its own, no need to borrow trouble.
Consider the case that the second credit card fails the charge. What do we do then? We already charged the previous one, and we don’t want to issue a refund, naturally. The result here is a partial error, meaning that there will be a secondary submission to handle the remainder payment.
From an architectural perspective, it makes the system a lot simpler to deal with, since you have well-defined scopes. I probably made it more complex than I should have, to be honest. We can simply make the entire process serial and forbid actual concurrency throughout the process. If we are dealing with humans, that is easy enough, since the latencies involved are short enough that they won’t be noticed. But I wanted to add the bit about making a part of the process fully concurrent, to deal with the more complex scenarios.
In truth, we haven’t done a big change in the system, we simply restructured the set of calls and the way you interact with the backend. But the end result of that is the amount of code and complexity that you have to juggle for your infrastructure needs are far more lightweight. On real-world systems, that also has the impact of reducing your latencies, because you are aggregating multiple operations and submitting them as a single shot. The backend will also make things easier, because you don’t need complex transaction coordination or distributed locking.
It is a simple change, on its face, but it has profound implications.