Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 5,953 | Comments: 44,409

filter by tags archive

Comparing developers

Recently I had to try to explain to a non technical person how I rate the developers that I work with. In technical terms, it is easy to do:

int Compare(devA, devB, ctx)

But it is very hard to do:

int Compare(devA, devB);

var score = Evaluate(dev);

What do I mean by that? I mean that it is pretty hard (at least for me), to give an objective measure of a developer with the absence of anyone to compare him to, but it very easy to compare two developers, but even so, only in a given context.

An objective evaluation of a developer is pretty hard, because there isn’t much that you can objectively measure. I’m sure that no reader of mine would suggest doing something like measuring lines of code, although I wish it was as easy as that.

How do you measure the effectiveness of a developer?

Well, to start with, you need to figure out the area in which you are measuring them. Trying to evaluate yours truly on his HTML5 dev skills would be… a negative experience. But in their areas of expertise, measuring the effectiveness of two people is much easier. I know that if I give a particular task to Joe, he will get it done on his own. But if I give it to Mark, it will require some guidance, but finish it much more quickly.  And Scott is great at finding the root cause of a problem, but is prune to analysis paralysis unless prodded.

This came up when I tried to explain why a person spending 2 weeks on a particular problem was a reasonable thing, and that in many cases you need a… spark of inspiration for certain things to just happen.

All measurement techniques that I’m familiar with is subject to the observer effect, which means that you might get a pretty big and nasty surprise by people adapting their behavior to match the required observations.

The problem is that most of the time, development is about things like stepping one foot after the other, getting things progressively better by making numerous minor changes that has major effect. And then you have a need for brilliance. A customer with a production problem that require someone to have the entire system in their head all at once to figure out. A way to optimize a particular approach, etc.

And the nasty part is that there is very little way to actually get those sparks on inspiration. But there is usually a correlation between certain people and the number of sparks of inspiration per time period they get. And one person’s spark can lead another to the right path and then you have an avalanche of good ideas.

But I’ll talk about the results of this in another post Smile.

The business process of comparing the price of milk

A law recently came into effect in Israel that mandated all supermarket chains to publicize their full pricing data and keep it up to date. The idea is that doing so will allow users to easily compare prices and lower costs. That has led to some interesting complaints in the media.

“Evil corporations are publishing the data in an unreadable format instead of an accessible files”.

I was really amused by the unreadable format (pretty horrible XML, to tell you the truth) and the accessible files (Excel spreadsheets) definitions. But expecting the media to actually get something right is… flawed, probably.

At any rate, one of the guys in the office played around with the data, toying with the notion of creating a small app for that purpose. (Side node, there are currently at least three major efforts to have a price comparison app/site for this. One of them is by a client of us, but beside for some modeling advice, we have no relation to this).

So we sat down and talked about the kind of features that you will want from this kind of app. The first option, of course, is to define a cart of the products I want to buy, and then compare this cart on various chains and branches. (For each chain, there is a different price list, and prices are also different on a per branch basis).

So far, that is easy, but that isn’t really what you want to do. The next obvious steps would be to take into account sales and promotions. The easiest ones are sales such as 1+1 or 2+1.That require an optimizing engine that can suggest getting another bottle of coke to the two you already got (same price), or recommending the purchase of 10 packages of diapers (non perishable product, currently on significant sale).

But this is really just the start. Sales and promotions are complex, a typical condition for a sale would be a whole chicken for 1 NIS (if you purchase more than 150 NIS total). Or maybe you get complementary products, or… you get the point. You need to take into account all the various discounts you can get.

Speaking of discounts, that means that you now need to also consider not just static stuff, but also dynamic (per user). Maybe the user has a set of coupons that they can use, or maybe they have a club membership in a particular chain that has certain benefits (for example 3% discount on all the chain’s branded products and another 5% for any 5 other products preselected at the time you joined the club).

So you now need a recommendation engine that can run those kind of projections and make suggestions based on them.

Oh, and wait! We also need to consider substitutions. If I purchased Bamba (a common peanut butter snack in Israel, and one of the only common uses of peanut butter in Israel), which is shown on the left, I might actually want to get Shush, which is pretty much the same thing, only slightly less costly. Or maybe get Parpar, which is another replacement, which is even cheaper.

To make it easier to the people not well versed with Israeli snacks, that means that we want to offer RC Cola or Pepsi Cola instead of Coca Cola, because they are cheaper.

Of course, some people swear that the taste is totally different, so they can’t be replaced. But does anyone really care what brand of cleaning product they use to clean the floor? The answer, by the way, is yes, people really do care about that kind of stuff. But even when you have a specific brand, maybe you can get the 2 gallon bottle of cleaning solution instead of the 1 gallon, because it end up being cheaper.

This is just to outline the complexity inherit in this. To do this well, you need to do quite a lot. This is actually really interesting from a technical perspective.  But this isn’t a post about the technical side.

Instead, let us talk about money. In particular, how to make money here. You can assume that selling the app wouldn’t work, not even a freemium model. But an app that guide people on what store to buy? There are quite a lot of ways to make money here.

We’ll start with an interesting observation. If you are a gateway to Joe making money, that usually mean that that Joe is going to pay you. Something that is very obvious is club membership referral fees.

If I run your cart through my engine, and I can tell you “Dude, if you join chain Xyz club you can save 100NIS today and 15,000 NIS this year, just click on this button to join”, that is a pretty compelling argument. And the chain would pay a certain fee for everyone we registered for them. Another way to make money is to get the aggregated data and do stuff with it. Coming up with “chain Xyz is the cheapest” so they can use it on their ads is something that would be worth money.

Another way to do this is to run interference. Instead of going to the supermarket, we’ll just make that order for you, and it will be on your door in a few hours… and the app will make sure to always get it from the cheapest location.

There are also holidays, and they typically have big purchases for the dinners, etc. That means that we can build a cart for the holidays, check it with the various chains, then suggest buying them at chain Xyz. Of course, that was pre-negotiated so they would get the big sales (presumably we are ethical and make sure that that chain is really the cheapest).

Of course, all of this make an interesting assumption. That the chains care about the price you buy from them. They don’t really care about that, they care about their margins. That means that we can play on that. We can suggest a cheaper cart overall, but one that would be more profitable to the chain because the products that we suggest have lower price for the customer, but higher margin for the chain.

And we totally left out the notion of coupons and discounts for actually using the app. “Scan this screen to get a 7% discount for Coca Cola”, for example. This way, the people paying us aren’t just the chains, but the manufacturers as well.

And coming back to the technical side of things. Consider targeted discounts. If I know what kind of products you want to buy, we can get all the way to customer specific price listing. (I’ll sell you diapers at cost, and you’ll buy the store brand of toilet paper at the list price, instead of the usually discounted price, so the store profit is higher).

Nitpicker corner: I have zero knowledge of retail beyond that is need to actually purchase food. I have no idea if those options are valid, and this is purely a mental exercise. This is interesting because of that. There are some technical problems here (in the recommendations), but a lot more business problems (in the kind of partnerships and deals that you can make for the customers).

I didn’t spend a lot of time considering this, and I’m pretty sure that I missed quite a few options. But it is a good way to remember that most business problems have depth behind them, not just technical solutions.

The state of a failure condition

I’m looking over of a bunch of distributed algorithm discussion groups, and I recently saw several people making the same bad assumption. The issue is that in a distributed system, you have to assume that any communication between system can fail.

Because that is taken into account in any distributed algorithm, there is a school of thought that believe that errors shouldn’t generate replies. That is horrifying to me.

Let me give a concrete example. In the Raft algorithm, nodes will participate in an election in order to decide who is the leader. A node can decide to vote for a certain candidate, to reject a candidate or it may be down and not responsive. Since we have to handle the non responsive node anyway, it is easy to assume that we only need to reply to the candidate when we actually vote for it. After all, no reply is a negative reply already, no?

The issue with this design decision is that this is indeed correct, but it is also boneheaded*. There are two reasons here. The minor one is that a non reply will force us to wait until a pre-configured timeout happen, after which we can go into failure handling. But actually sending a reply when we know that we refuse to vote for a node can give that node more information, and cut down the time it takes for the node to respond to negative replies.

As important as that is, this isn’t really my main concern. My main concern here is that not sending a reply leaves the administrator trying to figure out what is going on with essentially zero data. On the other hand, if the node send a “you are missing X,Y and Z for me to consider you applicable”, that is something that can be traced, that can be shown and acted upon.

It may seem like a small thing, overall, but it is something with crucial importance for operations. Those are hard enough when you have a single node. When you have a distributed system, you have to plan for that explicitly.

* I am using this terminology intentionally. Anyone who don’t consider production support and monitoring for their software from the get go never had to support complex production systems, where every nugget of information can be crucial.

Special Offer29% discount for all our products

Well, it is nearly the 29 May, and that means that I have been married for four years.

To celebrate that, I am offering a 29% discount on all our products (RavenDB, NHibernate Profiler, Entity Framework Profiler).

All you have to do is purchase any of our products using the following coupon code:

4th Anniversary

This offer is valid to the end of the month only.

RavenDB ShardingAdding a new shard to an existing cluster, splitting the shard

In my previous post, we have increased the capacity of the cluster by moving all new work to the new set of servers. In this post, I want to deal with a slightly harder problem, how to handle it when it isn’t new data that is causing the issue, but existing data. So we can’t just throw a new server, but need to actually move data between nodes.

We started with the following configuration:

var shards = new Dictionary<string, IDocumentStore>
    {"Shared", new DocumentStore {Url ="http://rvn1:8080", DefaultDatabase = "Shared"}},
    {"EU", new DocumentStore {Url = "http://rvn2:8080", DefaultDatabase = "Europe"}},
    {"NA", new DocumentStore {Url = "http://rvn3:8080", DefaultDatabase = "NorthAmerica"}},

And what we want is to add another server for EU and NA. Our new topology would be:

var shards = new Dictionary<string, IDocumentStore>
    {"Shared", new DocumentStore {Url ="http://rvn1:8080", DefaultDatabase = "Shared"}},
    {"EU1", new DocumentStore {Url = "http://rvn2:8080", DefaultDatabase = "Europe1"}},
    {"NA1", new DocumentStore {Url = "http://rvn3:8080", DefaultDatabase = "NorthAmerica1"}},
    {"EU2", new DocumentStore {Url = "http://rvn4:8080", DefaultDatabase = "Europe2"}},
    {"NA2", new DocumentStore {Url = "http://rvn5:8080", DefaultDatabase = "NorthAmerica2"}},

There are a couple of things that we need to pay attention to. First, we no longer use the EU / NA shard keys, they have been removed in favor of EU1 & EU2 / NA1 & NA2. We’ll also change the sharding configuration so it would split the new data between the two new nodes for each region evenly (see previous post for the details on exactly how this is done). But what about the existing data? We need to have some way of actually moving the data. That is when our ops tools come into play.

We use the smuggler to move the data between the servers:

Raven.Smuggler.exe  between http://rvn2:8080 http://rvn2:8080 --database=Europe --database2=Europe1 --transform-file=transform-1.js --incremental
Raven.Smuggler.exe  between http://rvn2:8080 http://rvn4:8080 --database=Europe --database2=Europe2 --transform-file=transform-2.js --incremental
Raven.Smuggler.exe  between http://rvn3:8080 http://rvn3:8080 --database=NorthAmerica --database2=NorthAmerica1 --transform-file=transform-1.js --incremental
Raven.Smuggler.exe  between http://rvn3:8080 http://rvn5:8080 --database=NorthAmerica --database2=NorthAmerica2 --transform-file=transform-2.js --incremental

The commands are pretty similar, with just the different options, so let us try to figure out what is going on. We are asking the smuggler to move the data between two databases in an incremental fashion, while applying a transform script. The transform-1.js file looks like this:

function(doc) { 
    var id = doc['@metadata']['@id']; 
    var node = (parseInt(id.substring(id.lastIndexOf('/')+1)) % 2);

    if(node == 1)
        return null;

    doc["@metadata"]["Raven-Shard-Id"] = doc["@metadata"]["Raven-Shard-Id"] + (node+1);

    return doc;

And the tranasform-2.js is exactly the same except that it return early if node is 0. In this way, we are able to split the data into the two new servers.

Note that the reason we use an incremental approach means that we can do this, even if it takes a long while, then the window of time when we switch is very narrow, and require us to only pass the recently changed data.

That still leaves the question of how are we going to deal with old ids. We are still going to have things like “EU/customers/###” in the database, even if those documents are on one of the two new nodes. We handle this, like most low level sharding behaviors, by customizing the sharding strategy. In this case, we modify the PotentialsServersFor(…) method:

public override IList<string> PotentialShardsFor(ShardRequestData requestData)
    var potentialShardsFor = base.PotentialShardsFor(requestData);
    if (potentialShardsFor.Contains("EU"))
    if (potentialShardsFor.Contains("NA"))
    return potentialShardsFor;

In this case, we are doing a very simple thing, when the default shard resolution strategy detect that we want to go to the old EU node, we’ll tell it to go to both EU1 and EU2. A more comprehensive solution would narrow it down to the exact server, but that depend on how exactly you split the data, and is left as an exercise for the reader.

RavenDB ShardingAdding a new shard to an existing cluster, the easy way

Continuing on the theme of giving a full answer to interesting questions on the mailing list in the blog, we have the following issue. 

We have a sharded cluster, and we want to add a new node to the cluster, how do we do it? I’ll discuss a few ways in which you can handle this scenario. But first, let us lay out the actual scenario.

We’ll use the Customers & Invoices system, and we put the data in the various shard along the following scheme:

Customers Sharded by region
Invoices Sharded by customer
Users Shared database (not sharded)

We can configure this using the following:

var shards = new Dictionary<string, IDocumentStore>
    {"Shared", new DocumentStore {Url ="http://rvn1:8080", DefaultDatabase = "Shared"}},
    {"EU", new DocumentStore {Url = "http://rvn2:8080", DefaultDatabase = "Europe"}},
    {"NA", new DocumentStore {Url = "http://rvn3:8080", DefaultDatabase = "NorthAmerica"}},
ShardStrategy shardStrategy = new ShardStrategy(shards)
    .ShardingOn<Company>(company =>company.Region, region =>
        switch (region)
            case "USA":
            case "Canada":
                return "NA";
            case "UK":
            case "France":
                return "EU";
                return "Shared";
    .ShardingOn<Invoice>(invoice => invoice.CompanyId)
    .ShardingOn<User>(user=> "Shared");

So far, so good. Now, we have so much work that we can’t just have two servers for customers & invoices, we need more. We change the sharding configuration to include 2 new servers, and we get:

var shards = new Dictionary<string, IDocumentStore>
         {"Shared", new DocumentStore {Url = "http://rvn1:8080", DefaultDatabase = "Shared"}},
         {"EU", new DocumentStore {Url = "http://rvn2:8080", DefaultDatabase = "Europe"}},
         {"NA", new DocumentStore {Url = "http://rvn3:8080", DefaultDatabase = "NorthAmerica"}},
        {"EU2", new DocumentStore {Url = "http://rvn4:8080", DefaultDatabase = "Europe-2"}},
        {"NA2", new DocumentStore {Url = "http://rvn5:8080", DefaultDatabase = "NorthAmerica-2"}},

    var shardStrategy = new ShardStrategy(shards);
    shardStrategy.ShardResolutionStrategy = new NewServerBiasedShardResolutionStrategy(shards.Keys, shardStrategy)
        .ShardingOn<Company>(company => company.Region, region =>
            switch (region)
                case "USA":
                case "Canada":
                    return "NA";
                case "UK":
                case "France":
                    return "EU";
                    return "Shared";
        .ShardingOn<Invoice>(invoice => invoice.CompanyId)
        .ShardingOn<User>(user => user.Id, userId => "Shared");

Note that we have a new shard resolution strategy, what is that? This is how we control lower level details of the sharding behavior, in this case, we want to control where we’ll write new documents.

public class NewServerBiasedShardResolutionStrategy : DefaultShardResolutionStrategy
    public NewServerBiasedShardResolutionStrategy(IEnumerable<string> shardIds, ShardStrategy shardStrategy)
        : base(shardIds, shardStrategy)

    public override string GenerateShardIdFor(object entity, ITransactionalDocumentSession sessionMetadata)
        var generatedShardId = base.GenerateShardIdFor(entity, sessionMetadata);
        if (entity is Company)
            if (DateTime.Today < new DateTime(2015, 8, 1) ||
                DateTime.Now.Ticks % 3 != 0)
                return generatedShardId + "2";
            return generatedShardId;
        return generatedShardId;

What is the logic? If we have a new company, we’ll call the GenerateShardIdFor(entity) method, and for the next 3 months, we’ll create all new companies (and as a result, their invoices) in the new servers. After the 3 months have passed, we’ll still generate the new companies on the new servers at a rate of two thirds on the new servers vs. one third on the old servers.

Note that in this case, we didn’t have to modify any data whatsoever. And over time, the data will balance itself out between the servers. In my next post, we’ll deal with how we actually split an existing shard into multiple shards.

Accepting code from the community means accepting full responsibility for all time

Sometimes, we’ll reject a certain pull request from the community, not because it doesn’t meet our standards, or doesn’t do things properly. We’ll reject it because we don’t want to accept the responsibility for this.

This seems obvious, but I got a comment on my recent post saying:

If you e.g. say that you're willing to accept a new F# module within RavenDB that does scripted deploys and automation of various tasks, I bet people would jump in with enthusiasm.

I wouldn’t accept such a PR. Not because there is anything wrong with F#, or because it wouldn’t be valuable. I wouldn’t accept such a PR because none of the core team of RavenDB has great expertise in F#. Oh, we have a few guys that played with it, and would love to do some more. In fact, I’ve got a guy that is pushing hard for allowing RavenDB to run computations via F#. It is a pretty cool feature, and I’ll talk about that in detail in a future post.

But imagine the scenario outline in the comment. A F# module that does some automation, scripted deploys, etc in F#. We go over the code, we are happy with it, we have a need for this feature. And at that point, we would have to make sure that it is written in C#.

Why? What is wrong with F#?

Pretty much nothing, except that any piece of code we ship (yes, even pieces that were contributed) has support requirements. A 2 AM call for one of the support team on that component means that the person answering the phone need to: Figure out the problem, decide if this is a misuse / feature / bug, fix this by either suggesting work around or supplying a hot fix.

And if the component with the issue is written in F#, that means that you need to have all the support group (now just over 20 people) be able to understand and work with that. And to the nitpickers, yes, it doesn’t take a long while to learn a new language, but it takes a good long while before you can be effective with it, especially at 2AM. Now multiple that by 20+ people, and you might see where we are starting to have an issue.

There is also the community angle to consider, code that is written in a language more people know gets more contributors.

Now, I’ve actually have run an real live study on that. In 2005 I wrote a view engine for MonoRail (MVC framework for ASP.Net) written in Boo. (You can read all about that here). Naturally, building a DSL that is based on Boo, I wrote that in Boo. And it was mildly successful. It was a pure OSS project, with multiple contributors. And I was the sole person that would actually modify the Boo code. Boo code looks pretty much like Python, and there isn’t any FP aspect to it. I find it imminently approachable and easy to work with.

The Brail codebase had pretty much zero outside contributions. At that point, I ported the Brail codebase to C#, the post is from 2006. Note the discussion around the reasoning. After that happened, we started getting more outside contributors and people were actually willing to take a look at the code.

For fun, this is actually in a situation where anyone using Brail was actually writing Boo code. It isn’t like they weren’t aware of how it worked. They pretty much had to, because Brail was a very thin DSL over Boo code. But moving the codebase to C# significantly improved the level of involvement.

Something that I think people miss is the fact that this kind of decision has pretty much nothing to do with the language or its merits. It is all about the implications on the project as a whole.


No future posts left, oh my!


  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series


Main feed Feed Stats
Comments feed   Comments Feed Stats