Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q j

Posts: 6,722 | Comments: 48,708

filter by tags archive

Travel schedule, 2019

time to read 2 min | 220 words

imageI’m going to be traveling extensively in the first part of 2019.

On January, I’m going to be in Sandusky, Ohio for CodeMash where I’ll be speaking about Extreme Performance Architecture, how we were able to refactor RavenDB to get insane level of performance. I’m going to be covering both low level details (there might be some assembly code) and the high level architecture that make it all possible.

On February, I’ll be in the RavenDB booth in the O’Reilly Software Architecture in New York. You should come and see what goodies we’ll have to show off at that stage. If you are in healthcare, we’ll also be at the HIMMS conference in Orlando where we’ll showing off some of the ways RavenDB make working on healthcare software easier.

I’m also going to have an event in New York in February (details to follow) in which I’ll speak about a grown up database. How RavenDB is able to run without a dedicated admin and what kind of behaviors you can expect from it.

On March, I’m going to be visiting the Gartner Data & Analytics conference in Orlando, you can ping me if you want to sit and have lunch there.

RavenDB C++ client: Laying the ground work

time to read 2 min | 371 words

The core concept underlying the RavenDB client API is the notion of Unit of Work. This provide core features such as change tracking and identity map. In all our previous clients, that was pretty easy to deal with, because the GC solved memory ownership and reflection gave us a lot of stuff basically for free.

Right now, what I want to achieve is the following:

It seems pretty simple, right? Because both the session and the caller code are going to share ownership on the passed User. Notice that we modify the user after we call store() but before save_changes(). We expect to see the modification in the document that is being generated.

The memory ownership is handled here by using shared_ptr as the underlying mode in which we accept and return data to the session. Now, let’s see how we actually deal with serialization, shall we? I have chosen to use nlohmann’s json for the project, which means that the provided API is quite nice. As a consumer, you’ll need to write your JSON serialization code, but it is fairly obvious how to do so, check this out:

Given that C++ doesn’t have reflection, I think that this represent a really nice handling of the issue. Now, how does this play with everything else? Here is what the skeleton of the session looks like:

There is a whole bunch of stuff that is going on here that we need to deal with.

First, we have the IEntityDetails interface, which is non generic. It is implemented by the generic class EntityDetails, which has the actual type that we are using and can then use the json serialization we defined to convert the entity to JSON. The rest are just details, we need to use a vector of shared_ptr, instead of the abstract class, because the abstract class has no defined size.

The generic store() method just capture the generic type and store it, and the rest of the code can work with the non generic interface.

I’m not sure how idiomatic this code is, or how performant, but at least as a proof of concept, it works to show that we can get a really good interface to our users in C++.

Abusing system flexibility to avoid paying collect tolls

time to read 3 min | 512 words

imageI’m going to feel like an old man for this post, but if you were born post 1995, it is likely that you have no idea what I’m talking about in this post, crazy as this sounds to me.

Before there was a phone in every pocket, there were land lines. It is like today’s phone, but much larger, you could only do voice calls and if you wanted to screen your calls you needed to buy another appliance. If you’ll watch the first few sessions of Friends, you’ll see how important a detail that can be. If you were out of the house or office and needed to place a call, you could use something called a public phone booth or a pay phone.

Sadly, the easiest way I can convey what this was is to invoke the Tardis. A small booth in which you had a public access phone. Because phone calls used to cost a lot, these phone had a way to drop some coins or tokens into the phone to pay for the phone call.

As a child, I didn’t have a wallet and still needed to occasionally make calls. Being stuck without cash at hand wasn’t such a strange thing so there was another way to perform the call. You could reverse the charge, instead of the person placing the call paying for it, you could call collect. In that case, the person answering the call would be paying for it. Naturally, since money is involved, you need the other party to accept the charge before actually charging them.

At some point in time, you called a special number and told the operator what number you wanted to do a collect call. The operator would ring this number and ask for permission to connect the call and charge the receiver. I think that the rate for a collect call was significantly higher than the normal call, so you wouldn’t normally do that.

As part of the system automation, the phone company replaced the manual operator collect call with an automated system. You would record a short message, which would be played to the other party. If they wanted to accept the call (and the charge), the could press 1 on the phone, or disconnect to avoid the charge.

As a kid, I quickly learned that instead of telling the other party who is calling and why (so they would accept the call), I could just tell them what my phone number is. In this way, they would write down the number, refuse the call and then call me back. That would avoid the collect toll charge.

I remember that at some point the phone company made the length of the collect hello message really short, but I got around that by speaking really fast (or sometimes by making two separate calls). I remember having to practice saying the phone number a few times to get it done in the right time.

System flexibility

time to read 3 min | 540 words

One of the absolutely most challenging things in designing software systems is that there is really no such thing is a perfect world. A business requirement that is set in stone turns out to be quite malleable. That can cause quite a big hassle for the development team, as they try to anticipate and address all aspects of change ahead of time.

A better alternative would be to not attempt to address all such issues in software, but in wetware.

I recently ordered lunch to go at a restaurant. I already paid and was waiting to get my order when the clerk double checked with me that the order is to go. The ordered was entered as if I was going to eat in the location, instead of taking the food away. After I confirmed that I want to take my order to go, I watched how the clerk fixed things up. She went to the kitchen window and shouted, “that last order, make it to go”. The kitchen staff double checked which order it was, then moved on with their tasks, eventually hanging me a baggie of tasty food to go.

On the way back, I kept wondering in my head how a software system would handle something like this. You’ll need to shred the idea of “an order is immutable once it is paid”, for example. Or you’ll need to add a side channel of additional instructions to the kitchen, etc.

Or, you can ignore the whole thing completely and shout at the cook. In software, that might mean that we’ll keep ourselves agile and provide a “Manual” mode in which a user can enter free text / instructions for the next person on the line to process this.

There are some cases where this would be a bad idea, but mostly these are involved not trusting your users to do their jobs. Sometimes, it is literally the software’s job to force the users to follow a specific path (usually because management decided that this must be so). However, a really important aspect of design is that it isn’t rigid, it allows the user to do their work, instead of working around the software. Part of that involves designing specific places where users can do stuff that you didn’t think that they would need.

For example, in an order, having a “Notes” text field that is editable even after the order is placed, which can be used for further communication. The idea is that you spend just a little bit of time to consider whatever scenarios you didn’t cover and try to give the user something (just bare minimum, maybe even below that), just to allow them to get by. The idea isn’t to provide a solution, but to get something that give the user a choice and that will raise enough feedback so you can plug this into the next iteration of your product.

Not having anything may mean that the users will solve their own problem using something else (email, or even just talking directly to one another) and we can’t have that*, obviously.

* It may sound silly, but in some cases, you literally can’t have that. Certain actions need to be logged, authorized and track appropriately for many purposes.

The perils of full system resource utilization

time to read 3 min | 471 words

The following quotes (or something very similar) came from our interactions with customers:

“We paid a lot of money for this hardware, why isn’t your database making full use of it?”

“The machine is peaking at 100% CPU, the sky is falling, help, NOW!”

This is a problem, because I can empathize with both sides. On the one hand, having just put a five or six figure sum into new hardware, it can be depressing to see is “going to waste”. On the other hand, seeing the system under high load gives you that sinking feeling that the boat is going to overturn at any moment and production will go down.

Balancing resource consumption is a really hard problem, mostly because we don’t have any control over our work intake. We can’t control how many requests we accept nor do we control what kind of work is being asked of us. Actually, that isn’t true. We could control that, but in most cases, that is a false distinction.

At some point, RavenDB had a max number of concurrent request limit, and users have hit that in the past. This resulted in angry calls from customers about RavenDB refusing requests. The fact that we did that to maintain the overall health of the system was immaterial. Refusing requests meant that the system (or some portion of it) was down. In those cases, it was actually better, from the customer’s perspective, for the whole thing to slow down a bit, as long as there were no errors.

Inside RavenDB, we attempt to manage our CPU consumption using separation of concerns. First, we have the processing of requests. The assumption is that such requests end up being waited on by an actual human, directly or indirectly, so we process them first, prioritizing them above almost everything else. The only thing that has higher priority is the cluster health and monitoring system, which ensure that all nodes are up, running and in the same state.

As it turns out, RavenDB have a lot of additional processes internally that can be given a lower priority under load. For example, indexing, which are something that RavenDB runs in the background, are something that we can increase the latency of to give more resources for request processing.

We have a lot of experience in balancing the overall needs, and I’m still not sure that I have a good answer here. The reason for this post is that I just analyzed a dump file where it looked like requests were waiting for indexing to complete, but they were actually starving the indexes from the CPU time that they needed to actually run.  The system progressed, but not fast enough for the user to not notice things.

Actually, that is the primary criteria that we use. If the system is slow, but no one notices, the system ain’t slow.

The design & challenges of a RavenDB C++ client

time to read 3 min | 468 words

When I wrote the first version of RavenDB, I was coming off about six years of intensive work on NHibernate. I wanted the same level of convenience that I had with a world class OR/M with non of the relational constraints (pun intended).

Given that I was working in a managed language, features such as change tracking, unit of work, etc. Since then, we created clients for: C#, Java, Python, Node.JS, Ruby and Go. A common feature of all these languages is that they all have automatic memory management. Go, in particular, has been interesting, because while it deals with explicit pointers, there is no need to deal with manually freeing memory.

We are now looking at what it would take to bring the same level of experience to a C++ client. For example, here is about the simplest CRUD scenario that I can think of:

This code isn’t showing something special, until you realize that when you want to translate it to C++, you’ll need to take into account the explicit memory ownership. Another issue to deal with is how we can implement seamless integration between business objects and JSON documents.

I looked at how this is handled in other similar databases, and the results seems to be, pretty badly.

At least, when I compare it to how much higher the level of the code is in C++. Now, it is possible that C++ developers like working at this level. And certainly, the RavenDB client APIs actually have user exposed layers that are similar to this, but this is something that you’ll usually not need. Ideally, I want to be able to give the same level of experience to the C++ client as well.

The issue of JSON serialization actually seems to be already well taken for already.  A user will need to define to_json and from_json functions to make this work, but given that C++ has no reflection, that seems reasonable to request. It also gives the user complete control over the serialization / deserialization process and avoid the process of “customizing” the JSON serialization, which you sometimes have to do.

The issue of memory ownership, though, it a bit more complex. I was thinking about exposing this via the following interface:

The idea is that the RavenDB C++ client will only deal with shared_ptr, with the idea that we can accept that the entities we manage may live longer than the lifetime of the session.

I’m no longer able to consider myself a C++ developer, and the dev we have started working on the C++ stuff is currently busy learning RavenDB itself, so I thought this would be a good time to ask for feedback.

Both on the kind of interface that you’ll like to see for C++ client and whatever this approach is going to work.

This code has expectations from the reader

time to read 2 min | 396 words

I just had a discussion with a colleague about a fix of non trivial code. The question was what comments should go into the code to explain what was going on.  If you care to know, this related to the prefetching strategy that is used by RavenDB to reduce the amount of I/O that is required (especially on slow disks). The details don’t actually matter. The problem is that there are multiple relatively complex issues there, from managing I/O to thread safety in the critical code path (using dirty reads intentionally), etc.

The problem with doing this is that the code is complex but it is a fairly straightforward progress from the kind of code we usually write in performance sensitive sections. The fear was by over commenting the code, we’ll get ourselves into a situation where we’ll be making the code too malleable to change. This is the kind of code that sits in the perf critical section, you change it after fasting for a day or two (with strong encouragement on meditation about little vs. big endian and why half endian is so rare).

In other words, in practice. You change it when you have reason, and you back up that change with a battery of performance tests. Anything from the usual benchmarks to running production loads on various machines to poring over system traces.

Given the amount of effort that is expected from any changes to this code, I consider it to be a good idea for people who read it to understand that there is a hurdle there that must be jumped before it should be modified. Thus, we decided to skip some of the comments on the reasoning behind the overall design. Here is the most important comment in this code, this is there to explain a particular choice of value and the reasoning that must be applied when it is changed.

What about the whole complexity of the prefetching in general? That isn’t document in code, because reading code comments scattered throughout will make it very hard to grok. This is detailed in the architecture guide that go over these details.

For myself, I find it really awesome to go over a codebase and figure out what reasoning lie behind the code. But when I have people working on my projects? It is better to give them a hand than a riddle.

Watch Federico Lois’s talk - Scratched metal

time to read 1 min | 114 words

Federico is the go to guy we have for all our performance issues, he talks about a lot of our challenges in this talk.

Micro-optimizations at the RavenDB vNext storage engine are critical to achieve 50K+ write requests per second on single node commodity hardware. In this talk we'll explore the use of the new hardware intrinsic introduced on CoreCLR 2.1 in the context of real-life critical path bottlenecks. We will touch on hardcore topics like CPU architecture and its effect on instruction latency and throughput, the effect of cache behaviors (hit/miss ratio, poisoning), prefetching, etc. The talk is aimed at engineers doing micro-optimization and high performance computing.

ChallengeThe loop that leaks–Answer

time to read 2 min | 328 words

In my previous post, I asked about the following code and what its output will be:

As it turns out, this code will output two different numbers:

  • On Debug – 134,284,904
  • On Release – 66,896

The behavior is consistent between these two modes.

I was pretty sure that I knew what was going on, but I asked to verify. You can read the GitHub issue if you want the spoiler.

I attached to the running program in WinDBG and issued the following command:

We care about the last line. In particular, we can see that all the memory is indeed in the byte array, as expected.

Next, let’s dump the actual instances that take so much space:

There is one large instance here that we care about, let’s see what is holding on to this fellow, shall we?

It looks like we have a reference from a local variable. Let’s see if we can verify that, shall we? We will use the clrstack command and ask it to give us the parameters and local variables, like so:

The interesting line is 16, which shows:

image

In other words, here is the local variable, and it is set to null. What is going on? And why don’t we see the same behavior on release mode?

As mentioned in the issue, the problem is that the JIT introduce a temporary local variable here, which the GC is obviously aware of, but WinDBG is not. This cause the program to hold on to the value for a longer period of time than expected.

In general, this should only be a problem if you have a long running loop. In fact, we do in some case, and in debug mode, that actually caused our memory utilization to go through the roof and led to this investigation.

In release mode, these temporary variables are rarer (but can still happen, it seems).

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Challenge (54):
    28 Sep 2018 - The loop that leaks–Answer
  2. Graphs in RavenDB (4):
    21 Sep 2018 - Graph modeling vs. document modeling
  3. Reviewing FASTER (9):
    06 Sep 2018 - Summary
  4. RavenDB 4.1 features (12):
    22 Aug 2018 - MongoDB & CosmosDB Migration Wizards
  5. Reading the NSA’s codebase (7):
    13 Aug 2018 - LemonGraph review–Part VII–Summary
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats