Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q j

Posts: 6,730 | Comments: 48,735

filter by tags archive

Don’t shove that (cool) feature down my throat, please

time to read 2 min | 220 words

image“I just found out that you can do Dancing Rhinos in 4D if you use FancyDoodad 2.43.3” started a conversation at the office. That is pretty cool, I’ll admit, getting Rhinos to dance at all is nice, and in 4D is probably nicer. I wasn’t aware that FancyDoodad had this feature at all. Great tidbit and something to discuss over lunch or coffee.

The problem is that the follow up was something in the order: “Now I wonder how we can use FancyDoodad’s cool feature for us. Do you think it can solve the balance issue for this problem?”

Well, this problem has nothing to do with Rhinos, wildlife, dancing or (hopefully) dimensional math. So while I can see that if you had a burning enough desire and only a hammer, you would be able to use FancyDoodad to try to solve this thing, I don’t see the point.

The fact that something is cool doesn’t meant that it is :

  • Useful.
  • Ought to go into our codebase.
  • Solve our actual problem.

So broaden your horizons as much as possible, learn as much as you can ingest. But remember that every thing starts at negative hundred points, and coolness on its own doesn’t affect that math.

Graphs in RavenDBQuery results

time to read 2 min | 381 words

imageWe run into an interesting design issue when building graph queries for RavenDB. The problem statement is fairly easy. Should a document be allowed to be bound to multiple aliases in the query results, or just one? However, without context, the problem statement in not meaningful, so let’s talk about what the actual problem is. Consider the graph on the right. We have three documents, Arava, Oscar and Phoebe and the following edges:

  • Arava Likes Oscar
  • Phoebe Likes Oscar

We now run the following query:

image

This query asks for a a dog that likes another dog that is liked by a dog. Another way to express the same sentiment (indeed, how RavenDB actually considers this type of query) is to write it as follows:

image

When processing the and expression, we require that documents that match to the same alias will be the same. Given the graph that we execute this on, what would you consider the right result?

Right now, we have the first option, in which a document can be match to multiple different alias in the same result, which would lead to the following results:

image

Note that in this case, the first and last entries match A and C to the same document.

The second option is to ensure that a document can only be bound to a single alias in the result, which would remove the duplicate results above and give us only:

image

Note that in either case, position matters, and the minimum number of results this query will generate is two, because we need to consider different starting points for the pattern match on the graph.

What do you think should we do in such a case? Are there reasons to want this behavior or that and should it be something that the user select?

Paranoid decisions and OMG customers

time to read 3 min | 427 words

imageI used to be a consultant for a long while, and that meant that I worked on a lot of customer projects. That led to me seeing and acting is some really strange ways.

Sometimes you go into a codebase and you can’t really believe what you see there. I think that this is similar to how an archeologist feels, seeing just remnants of something and having to deduct what were the forces that drove the people who built it. In some cases, what looks like bad code is actually a reaction to a bad policy that people are trying to workaround.

I think that the strangest of these cases was when I was working for a customer that refused to let external consultants to use their internal source control system. Apparently, they had sensitive stuff there that they couldn’t isolate or something like that. They were using either Team Foundation Server or Visual Source Safe and I didn’t really want to use that source control anyway, so I didn’t push. I did worry about source control, so we had a shared directory being used as a Subversion repository, this was over a decade ago, mind.

So far, so good, and nothing really interesting to talk about. What killed me was that their operations team flat out refused to back up the Subversion folder. That folder was hosted on shared server that belong to the consulting company (but resided at the customer site), and they were unwilling to either back up a “foreign” computer or to provide us with a shared space to host Subversion that they would back up.

For a while, I would backup the Subversion repository every few days to my iPod, then take a copy of the entire source code history with me home. That wasn’t sustainable, and I was deeply concerned about the future of the project over time, so I also added a twist. As part of the build process, we packed the entire source directory of the codebase as an embedded resource into the binary. In this way, if the code was ever lost, which I considered to be a real possibility, I would have a way to recover it all back.

After we handed off the project, I believe they moved the source to their own repository, so we never actually needed that, but I slept a lot better knowing that I had a second string in my bow.

What is your craziest story?

Managing a multi version project

time to read 4 min | 707 words

image

As I’m writing this, we have the following branches in the main repository of RavenDB. Looking at their history, we have:

Branch

Last Commit

Number of commits this year

v1.0

Feb 3, 2013

0

v2.0

Oct 14, 2016

0

v2.5

Oct 18, 2018

14

v3.0

Aug 14, 2018

10

v3.5

Oct 11, 2018

45

v4.0

Oct 18, 2018

2,270

v4.1

Oct 18, 2018

3,214

v4.2

Oct 18, 2018

95

The numbers are actually really interesting. Branches v1.0 and v2.0 are legacy and not longer supported. Branch v2.5 is also legacy, but we have a few customers with support contracts that are still using it so there are still minor bug fixes going on there occasionally. Most of the people on the 3.x line are using 3.5, which is now in maintenance mode, so you can see that there are very little work on the v3.0 branch and a bit of ongoing bug fixes for customers.

The bulk of the work is on the 4.x line. We released v4.0 in Feb of this year, and then switch to working on v4.1, which was released a couple of months ago. We actively started working on v4.2 this month. We are going to close down the v4.0 branch for new features at the end of this month and move it too to maintenance mode.

In practical terms, we very rarely need to do cross major version work but we do have a lot of prev, current, next parallel work. In other words, the situation right now is that a bug fix has to go to at least v4.1 and v4.2 and usually to v4.0 as well. We have been dealing with several different ways to handle this task.

For v4.0 and v4.1 work, which went on in parallel for most of this year, we had the developers submit two pull requests for their changes, one for v4.0 and one for v4.1. This increased the amount of work each change took, but the cost was usually just a few minutes at PR submission time, since we could usually get cherry pick the relevant changes and be done with it. The reason we did it this way is to avoid big merges as we move work between actively worked on branches. That would require having someone dedicated just to handle that, and it was easier to do it in line, rather than in a big bang fashion.

For the v4.2 branch, we are experimenting with something else. Most of the work is going on in the v4.1 branch at this point, mostly minor features and updates, while the v4.2 branch is experimenting with much larger scope of changes. It doesn’t make sense to ask the team to send three PRs, and we are going to close down v4.0 this month anyway. What we are currently doing is designating a person that is in charge of merging the v4.1 changes to v4.2 on a regular basis. So far, we are still pretty close and there hasn’t been a big amount of changes. Depending on how it goes, we’ll keep doing the dual PR once v4.0 is retired from active status or see if the merges can keep going on.

For feature branches, the situation is more straightforward. We typically ask the owner of the feature to rebase on a regular basis on top of whatever the baseline is, and the responsibility to do that is on them.

A long feature branch for us can last up to a month or so, but we had a few that took 3 months when it was a big change. I tend to really dislike those and we are trying to get them to a much shorter timeframes. Most of the work doesn’t happen in a feature branch, we’ll accept partial solutions (if they don’t impact anything else) and we tend to collaborate a lot more closely on code that is already merged rather than in independent branches.

Memory management goop in Windows & Linux

time to read 2 min | 323 words

Regardless of the operating system you use, you are going to get roughly the same services from each of them. In particular, process and memory isolation, managing the hardware, etc. It can sometimes be really interesting to see the difference between the operating systems approach to solving the same problem. Case in point, how both Windows and Linux manage memory. Both of them run on the same hardware and do roughly the same thing. But they have very different styles, this end up having profound implications on the application using them.

Consider what appears to be a very simple question, what stuff do I have in my RAM? Linux keeps track of Resident Set Size on a per mapping basis, which means that we are able to figure out how much of a mmap file is actually in memory. Further more, we can figure out how of the mmap data is clean, which means that it is easily discardable and how much is dirty and needs to be written to disk. Linux exposes this information via the /proc/[pid]/smaps

On the other hand, Windows doesn’t seem to bother to do this tracking. You can get this information, but you need to ask it for each page individually. This means that it isn’t feasible to check what percentage of the system memory is clean (mmap pages that hasn’t been modified and can be cheaply discarded). Windows expose this via the QueryWorkingSetEx method.

As a result, we have to be more conservative on Windows when the system reports high memory usage. We know that our usage pattern means that high amount of memory in use (coming from mmap clean pages) is fine. It is a small detail, but it has caused us to have to jump through several hurdles when we are running under load. I guess that Windows doesn’t need this information, so it isn’t exposed, while on Linux it seems to be used by plenty of callers.

The mental weight of open pull requests

time to read 1 min | 108 words

I just merged two PRs into RavenDB, and for the first time in a while, I got this beautiful number:

image

For the past few months, we have been working on several long running features, graph queries being the most obvious example. We are now at a stage where we are ready to pull all this work together, which mean that all the long running feature branches (and the discussion about them in the PR) is merged to the next release branch.

And for a while, I can luxuriate in that wonderful feeling.

Travel schedule, 2019

time to read 2 min | 220 words

imageI’m going to be traveling extensively in the first part of 2019.

On January, I’m going to be in Sandusky, Ohio for CodeMash where I’ll be speaking about Extreme Performance Architecture, how we were able to refactor RavenDB to get insane level of performance. I’m going to be covering both low level details (there might be some assembly code) and the high level architecture that make it all possible.

On February, I’ll be in the RavenDB booth in the O’Reilly Software Architecture in New York. You should come and see what goodies we’ll have to show off at that stage. If you are in healthcare, we’ll also be at the HIMMS conference in Orlando where we’ll showing off some of the ways RavenDB make working on healthcare software easier.

I’m also going to have an event in New York in February (details to follow) in which I’ll speak about a grown up database. How RavenDB is able to run without a dedicated admin and what kind of behaviors you can expect from it.

On March, I’m going to be visiting the Gartner Data & Analytics conference in Orlando, you can ping me if you want to sit and have lunch there.

RavenDB C++ client: Laying the ground work

time to read 2 min | 371 words

The core concept underlying the RavenDB client API is the notion of Unit of Work. This provide core features such as change tracking and identity map. In all our previous clients, that was pretty easy to deal with, because the GC solved memory ownership and reflection gave us a lot of stuff basically for free.

Right now, what I want to achieve is the following:

It seems pretty simple, right? Because both the session and the caller code are going to share ownership on the passed User. Notice that we modify the user after we call store() but before save_changes(). We expect to see the modification in the document that is being generated.

The memory ownership is handled here by using shared_ptr as the underlying mode in which we accept and return data to the session. Now, let’s see how we actually deal with serialization, shall we? I have chosen to use nlohmann’s json for the project, which means that the provided API is quite nice. As a consumer, you’ll need to write your JSON serialization code, but it is fairly obvious how to do so, check this out:

Given that C++ doesn’t have reflection, I think that this represent a really nice handling of the issue. Now, how does this play with everything else? Here is what the skeleton of the session looks like:

There is a whole bunch of stuff that is going on here that we need to deal with.

First, we have the IEntityDetails interface, which is non generic. It is implemented by the generic class EntityDetails, which has the actual type that we are using and can then use the json serialization we defined to convert the entity to JSON. The rest are just details, we need to use a vector of shared_ptr, instead of the abstract class, because the abstract class has no defined size.

The generic store() method just capture the generic type and store it, and the rest of the code can work with the non generic interface.

I’m not sure how idiomatic this code is, or how performant, but at least as a proof of concept, it works to show that we can get a really good interface to our users in C++.

Abusing system flexibility to avoid paying collect tolls

time to read 3 min | 512 words

imageI’m going to feel like an old man for this post, but if you were born post 1995, it is likely that you have no idea what I’m talking about in this post, crazy as this sounds to me.

Before there was a phone in every pocket, there were land lines. It is like today’s phone, but much larger, you could only do voice calls and if you wanted to screen your calls you needed to buy another appliance. If you’ll watch the first few sessions of Friends, you’ll see how important a detail that can be. If you were out of the house or office and needed to place a call, you could use something called a public phone booth or a pay phone.

Sadly, the easiest way I can convey what this was is to invoke the Tardis. A small booth in which you had a public access phone. Because phone calls used to cost a lot, these phone had a way to drop some coins or tokens into the phone to pay for the phone call.

As a child, I didn’t have a wallet and still needed to occasionally make calls. Being stuck without cash at hand wasn’t such a strange thing so there was another way to perform the call. You could reverse the charge, instead of the person placing the call paying for it, you could call collect. In that case, the person answering the call would be paying for it. Naturally, since money is involved, you need the other party to accept the charge before actually charging them.

At some point in time, you called a special number and told the operator what number you wanted to do a collect call. The operator would ring this number and ask for permission to connect the call and charge the receiver. I think that the rate for a collect call was significantly higher than the normal call, so you wouldn’t normally do that.

As part of the system automation, the phone company replaced the manual operator collect call with an automated system. You would record a short message, which would be played to the other party. If they wanted to accept the call (and the charge), the could press 1 on the phone, or disconnect to avoid the charge.

As a kid, I quickly learned that instead of telling the other party who is calling and why (so they would accept the call), I could just tell them what my phone number is. In this way, they would write down the number, refuse the call and then call me back. That would avoid the collect toll charge.

I remember that at some point the phone company made the length of the collect hello message really short, but I got around that by speaking really fast (or sometimes by making two separate calls). I remember having to practice saying the phone number a few times to get it done in the right time.

System flexibility

time to read 3 min | 540 words

One of the absolutely most challenging things in designing software systems is that there is really no such thing is a perfect world. A business requirement that is set in stone turns out to be quite malleable. That can cause quite a big hassle for the development team, as they try to anticipate and address all aspects of change ahead of time.

A better alternative would be to not attempt to address all such issues in software, but in wetware.

I recently ordered lunch to go at a restaurant. I already paid and was waiting to get my order when the clerk double checked with me that the order is to go. The ordered was entered as if I was going to eat in the location, instead of taking the food away. After I confirmed that I want to take my order to go, I watched how the clerk fixed things up. She went to the kitchen window and shouted, “that last order, make it to go”. The kitchen staff double checked which order it was, then moved on with their tasks, eventually hanging me a baggie of tasty food to go.

On the way back, I kept wondering in my head how a software system would handle something like this. You’ll need to shred the idea of “an order is immutable once it is paid”, for example. Or you’ll need to add a side channel of additional instructions to the kitchen, etc.

Or, you can ignore the whole thing completely and shout at the cook. In software, that might mean that we’ll keep ourselves agile and provide a “Manual” mode in which a user can enter free text / instructions for the next person on the line to process this.

There are some cases where this would be a bad idea, but mostly these are involved not trusting your users to do their jobs. Sometimes, it is literally the software’s job to force the users to follow a specific path (usually because management decided that this must be so). However, a really important aspect of design is that it isn’t rigid, it allows the user to do their work, instead of working around the software. Part of that involves designing specific places where users can do stuff that you didn’t think that they would need.

For example, in an order, having a “Notes” text field that is editable even after the order is placed, which can be used for further communication. The idea is that you spend just a little bit of time to consider whatever scenarios you didn’t cover and try to give the user something (just bare minimum, maybe even below that), just to allow them to get by. The idea isn’t to provide a solution, but to get something that give the user a choice and that will raise enough feedback so you can plug this into the next iteration of your product.

Not having anything may mean that the users will solve their own problem using something else (email, or even just talking directly to one another) and we can’t have that*, obviously.

* It may sound silly, but in some cases, you literally can’t have that. Certain actions need to be logged, authorized and track appropriately for many purposes.

FUTURE POSTS

  1. Stabilization stories: Slow TCP under Linux - 8 hours from now
  2. Graphs in RavenDB: I didn’t mean to build this feature! - about one day from now

There are posts all the way to Oct 25, 2018

RECENT SERIES

  1. Graphs in RavenDB (6):
    22 Oct 2018 - Query results
  2. Challenge (54):
    28 Sep 2018 - The loop that leaks–Answer
  3. Reviewing FASTER (9):
    06 Sep 2018 - Summary
  4. RavenDB 4.1 features (12):
    22 Aug 2018 - MongoDB & CosmosDB Migration Wizards
  5. Reading the NSA’s codebase (7):
    13 Aug 2018 - LemonGraph review–Part VII–Summary
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats