Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,953 | Comments: 44,399

filter by tags archive

The business process of comparing the price of milk


A law recently came into effect in Israel that mandated all supermarket chains to publicize their full pricing data and keep it up to date. The idea is that doing so will allow users to easily compare prices and lower costs. That has led to some interesting complaints in the media.

“Evil corporations are publishing the data in an unreadable format instead of an accessible files”.

I was really amused by the unreadable format (pretty horrible XML, to tell you the truth) and the accessible files (Excel spreadsheets) definitions. But expecting the media to actually get something right is… flawed, probably.

At any rate, one of the guys in the office played around with the data, toying with the notion of creating a small app for that purpose. (Side node, there are currently at least three major efforts to have a price comparison app/site for this. One of them is by a client of us, but beside for some modeling advice, we have no relation to this).

So we sat down and talked about the kind of features that you will want from this kind of app. The first option, of course, is to define a cart of the products I want to buy, and then compare this cart on various chains and branches. (For each chain, there is a different price list, and prices are also different on a per branch basis).

So far, that is easy, but that isn’t really what you want to do. The next obvious steps would be to take into account sales and promotions. The easiest ones are sales such as 1+1 or 2+1.That require an optimizing engine that can suggest getting another bottle of coke to the two you already got (same price), or recommending the purchase of 10 packages of diapers (non perishable product, currently on significant sale).

But this is really just the start. Sales and promotions are complex, a typical condition for a sale would be a whole chicken for 1 NIS (if you purchase more than 150 NIS total). Or maybe you get complementary products, or… you get the point. You need to take into account all the various discounts you can get.

Speaking of discounts, that means that you now need to also consider not just static stuff, but also dynamic (per user). Maybe the user has a set of coupons that they can use, or maybe they have a club membership in a particular chain that has certain benefits (for example 3% discount on all the chain’s branded products and another 5% for any 5 other products preselected at the time you joined the club).

So you now need a recommendation engine that can run those kind of projections and make suggestions based on them.

Oh, and wait! We also need to consider substitutions. If I purchased Bamba (a common peanut butter snack in Israel, and one of the only common uses of peanut butter in Israel), which is shown on the left, I might actually want to get Shush, which is pretty much the same thing, only slightly less costly. Or maybe get Parpar, which is another replacement, which is even cheaper.

To make it easier to the people not well versed with Israeli snacks, that means that we want to offer RC Cola or Pepsi Cola instead of Coca Cola, because they are cheaper.

Of course, some people swear that the taste is totally different, so they can’t be replaced. But does anyone really care what brand of cleaning product they use to clean the floor? The answer, by the way, is yes, people really do care about that kind of stuff. But even when you have a specific brand, maybe you can get the 2 gallon bottle of cleaning solution instead of the 1 gallon, because it end up being cheaper.

This is just to outline the complexity inherit in this. To do this well, you need to do quite a lot. This is actually really interesting from a technical perspective.  But this isn’t a post about the technical side.

Instead, let us talk about money. In particular, how to make money here. You can assume that selling the app wouldn’t work, not even a freemium model. But an app that guide people on what store to buy? There are quite a lot of ways to make money here.

We’ll start with an interesting observation. If you are a gateway to Joe making money, that usually mean that that Joe is going to pay you. Something that is very obvious is club membership referral fees.

If I run your cart through my engine, and I can tell you “Dude, if you join chain Xyz club you can save 100NIS today and 15,000 NIS this year, just click on this button to join”, that is a pretty compelling argument. And the chain would pay a certain fee for everyone we registered for them. Another way to make money is to get the aggregated data and do stuff with it. Coming up with “chain Xyz is the cheapest” so they can use it on their ads is something that would be worth money.

Another way to do this is to run interference. Instead of going to the supermarket, we’ll just make that order for you, and it will be on your door in a few hours… and the app will make sure to always get it from the cheapest location.

There are also holidays, and they typically have big purchases for the dinners, etc. That means that we can build a cart for the holidays, check it with the various chains, then suggest buying them at chain Xyz. Of course, that was pre-negotiated so they would get the big sales (presumably we are ethical and make sure that that chain is really the cheapest).

Of course, all of this make an interesting assumption. That the chains care about the price you buy from them. They don’t really care about that, they care about their margins. That means that we can play on that. We can suggest a cheaper cart overall, but one that would be more profitable to the chain because the products that we suggest have lower price for the customer, but higher margin for the chain.

And we totally left out the notion of coupons and discounts for actually using the app. “Scan this screen to get a 7% discount for Coca Cola”, for example. This way, the people paying us aren’t just the chains, but the manufacturers as well.

And coming back to the technical side of things. Consider targeted discounts. If I know what kind of products you want to buy, we can get all the way to customer specific price listing. (I’ll sell you diapers at cost, and you’ll buy the store brand of toilet paper at the list price, instead of the usually discounted price, so the store profit is higher).

Nitpicker corner: I have zero knowledge of retail beyond that is need to actually purchase food. I have no idea if those options are valid, and this is purely a mental exercise. This is interesting because of that. There are some technical problems here (in the recommendations), but a lot more business problems (in the kind of partnerships and deals that you can make for the customers).

I didn’t spend a lot of time considering this, and I’m pretty sure that I missed quite a few options. But it is a good way to remember that most business problems have depth behind them, not just technical solutions.

The state of a failure condition


I’m looking over of a bunch of distributed algorithm discussion groups, and I recently saw several people making the same bad assumption. The issue is that in a distributed system, you have to assume that any communication between system can fail.

Because that is taken into account in any distributed algorithm, there is a school of thought that believe that errors shouldn’t generate replies. That is horrifying to me.

Let me give a concrete example. In the Raft algorithm, nodes will participate in an election in order to decide who is the leader. A node can decide to vote for a certain candidate, to reject a candidate or it may be down and not responsive. Since we have to handle the non responsive node anyway, it is easy to assume that we only need to reply to the candidate when we actually vote for it. After all, no reply is a negative reply already, no?

The issue with this design decision is that this is indeed correct, but it is also boneheaded*. There are two reasons here. The minor one is that a non reply will force us to wait until a pre-configured timeout happen, after which we can go into failure handling. But actually sending a reply when we know that we refuse to vote for a node can give that node more information, and cut down the time it takes for the node to respond to negative replies.

As important as that is, this isn’t really my main concern. My main concern here is that not sending a reply leaves the administrator trying to figure out what is going on with essentially zero data. On the other hand, if the node send a “you are missing X,Y and Z for me to consider you applicable”, that is something that can be traced, that can be shown and acted upon.

It may seem like a small thing, overall, but it is something with crucial importance for operations. Those are hard enough when you have a single node. When you have a distributed system, you have to plan for that explicitly.

* I am using this terminology intentionally. Anyone who don’t consider production support and monitoring for their software from the get go never had to support complex production systems, where every nugget of information can be crucial.

Special Offer29% discount for all our products


Well, it is nearly the 29 May, and that means that I have been married for four years.

To celebrate that, I am offering a 29% discount on all our products (RavenDB, NHibernate Profiler, Entity Framework Profiler).

All you have to do is purchase any of our products using the following coupon code:

4th Anniversary

This offer is valid to the end of the month only.

RavenDB ShardingAdding a new shard to an existing cluster, splitting the shard


In my previous post, we have increased the capacity of the cluster by moving all new work to the new set of servers. In this post, I want to deal with a slightly harder problem, how to handle it when it isn’t new data that is causing the issue, but existing data. So we can’t just throw a new server, but need to actually move data between nodes.

We started with the following configuration:

var shards = new Dictionary<string, IDocumentStore>
{
    {"Shared", new DocumentStore {Url ="http://rvn1:8080", DefaultDatabase = "Shared"}},
    {"EU", new DocumentStore {Url = "http://rvn2:8080", DefaultDatabase = "Europe"}},
    {"NA", new DocumentStore {Url = "http://rvn3:8080", DefaultDatabase = "NorthAmerica"}},
};

And what we want is to add another server for EU and NA. Our new topology would be:

var shards = new Dictionary<string, IDocumentStore>
{
    {"Shared", new DocumentStore {Url ="http://rvn1:8080", DefaultDatabase = "Shared"}},
    {"EU1", new DocumentStore {Url = "http://rvn2:8080", DefaultDatabase = "Europe1"}},
    {"NA1", new DocumentStore {Url = "http://rvn3:8080", DefaultDatabase = "NorthAmerica1"}},
    {"EU2", new DocumentStore {Url = "http://rvn4:8080", DefaultDatabase = "Europe2"}},
    {"NA2", new DocumentStore {Url = "http://rvn5:8080", DefaultDatabase = "NorthAmerica2"}},
};

There are a couple of things that we need to pay attention to. First, we no longer use the EU / NA shard keys, they have been removed in favor of EU1 & EU2 / NA1 & NA2. We’ll also change the sharding configuration so it would split the new data between the two new nodes for each region evenly (see previous post for the details on exactly how this is done). But what about the existing data? We need to have some way of actually moving the data. That is when our ops tools come into play.

We use the smuggler to move the data between the servers:

Raven.Smuggler.exe  between http://rvn2:8080 http://rvn2:8080 --database=Europe --database2=Europe1 --transform-file=transform-1.js --incremental
Raven.Smuggler.exe  between http://rvn2:8080 http://rvn4:8080 --database=Europe --database2=Europe2 --transform-file=transform-2.js --incremental
Raven.Smuggler.exe  between http://rvn3:8080 http://rvn3:8080 --database=NorthAmerica --database2=NorthAmerica1 --transform-file=transform-1.js --incremental
Raven.Smuggler.exe  between http://rvn3:8080 http://rvn5:8080 --database=NorthAmerica --database2=NorthAmerica2 --transform-file=transform-2.js --incremental

The commands are pretty similar, with just the different options, so let us try to figure out what is going on. We are asking the smuggler to move the data between two databases in an incremental fashion, while applying a transform script. The transform-1.js file looks like this:

function(doc) { 
    var id = doc['@metadata']['@id']; 
    var node = (parseInt(id.substring(id.lastIndexOf('/')+1)) % 2);

    if(node == 1)
        return null;

    doc["@metadata"]["Raven-Shard-Id"] = doc["@metadata"]["Raven-Shard-Id"] + (node+1);

    return doc;
}

And the tranasform-2.js is exactly the same except that it return early if node is 0. In this way, we are able to split the data into the two new servers.

Note that the reason we use an incremental approach means that we can do this, even if it takes a long while, then the window of time when we switch is very narrow, and require us to only pass the recently changed data.

That still leaves the question of how are we going to deal with old ids. We are still going to have things like “EU/customers/###” in the database, even if those documents are on one of the two new nodes. We handle this, like most low level sharding behaviors, by customizing the sharding strategy. In this case, we modify the PotentialsServersFor(…) method:

public override IList<string> PotentialShardsFor(ShardRequestData requestData)
{
    var potentialShardsFor = base.PotentialShardsFor(requestData);
    if (potentialShardsFor.Contains("EU"))
    {
        potentialShardsFor.Remove("EU");
        potentialShardsFor.Add("EU1");
        potentialShardsFor.Add("EU2");
    }
    if (potentialShardsFor.Contains("NA"))
    {
        potentialShardsFor.Remove("NA");
        potentialShardsFor.Add("NA1");
        potentialShardsFor.Add("NA2");
    }
    return potentialShardsFor;
}

In this case, we are doing a very simple thing, when the default shard resolution strategy detect that we want to go to the old EU node, we’ll tell it to go to both EU1 and EU2. A more comprehensive solution would narrow it down to the exact server, but that depend on how exactly you split the data, and is left as an exercise for the reader.

RavenDB ShardingAdding a new shard to an existing cluster, the easy way


Continuing on the theme of giving a full answer to interesting questions on the mailing list in the blog, we have the following issue. 

We have a sharded cluster, and we want to add a new node to the cluster, how do we do it? I’ll discuss a few ways in which you can handle this scenario. But first, let us lay out the actual scenario.

We’ll use the Customers & Invoices system, and we put the data in the various shard along the following scheme:

Customers Sharded by region
Invoices Sharded by customer
Users Shared database (not sharded)

We can configure this using the following:

var shards = new Dictionary<string, IDocumentStore>
{
    {"Shared", new DocumentStore {Url ="http://rvn1:8080", DefaultDatabase = "Shared"}},
    {"EU", new DocumentStore {Url = "http://rvn2:8080", DefaultDatabase = "Europe"}},
    {"NA", new DocumentStore {Url = "http://rvn3:8080", DefaultDatabase = "NorthAmerica"}},
};
ShardStrategy shardStrategy = new ShardStrategy(shards)
    .ShardingOn<Company>(company =>company.Region, region =>
    {
        switch (region)
        {
            case "USA":
            case "Canada":
                return "NA";
            case "UK":
            case "France":
                return "EU";
            default:
                return "Shared";
        }
    })
    .ShardingOn<Invoice>(invoice => invoice.CompanyId)
    .ShardingOn<User>(user=> "Shared");
 

So far, so good. Now, we have so much work that we can’t just have two servers for customers & invoices, we need more. We change the sharding configuration to include 2 new servers, and we get:

var shards = new Dictionary<string, IDocumentStore>
     {
         {"Shared", new DocumentStore {Url = "http://rvn1:8080", DefaultDatabase = "Shared"}},
         {"EU", new DocumentStore {Url = "http://rvn2:8080", DefaultDatabase = "Europe"}},
         {"NA", new DocumentStore {Url = "http://rvn3:8080", DefaultDatabase = "NorthAmerica"}},
        {"EU2", new DocumentStore {Url = "http://rvn4:8080", DefaultDatabase = "Europe-2"}},
        {"NA2", new DocumentStore {Url = "http://rvn5:8080", DefaultDatabase = "NorthAmerica-2"}},
     };

    var shardStrategy = new ShardStrategy(shards);
    shardStrategy.ShardResolutionStrategy = new NewServerBiasedShardResolutionStrategy(shards.Keys, shardStrategy)
        .ShardingOn<Company>(company => company.Region, region =>
        {
            switch (region)
            {
                case "USA":
                case "Canada":
                    return "NA";
                case "UK":
                case "France":
                    return "EU";
                default:
                    return "Shared";
            }
        })
        .ShardingOn<Invoice>(invoice => invoice.CompanyId)
        .ShardingOn<User>(user => user.Id, userId => "Shared");

Note that we have a new shard resolution strategy, what is that? This is how we control lower level details of the sharding behavior, in this case, we want to control where we’ll write new documents.

public class NewServerBiasedShardResolutionStrategy : DefaultShardResolutionStrategy
{
    public NewServerBiasedShardResolutionStrategy(IEnumerable<string> shardIds, ShardStrategy shardStrategy)
        : base(shardIds, shardStrategy)
    {
    }

    public override string GenerateShardIdFor(object entity, ITransactionalDocumentSession sessionMetadata)
    {
        var generatedShardId = base.GenerateShardIdFor(entity, sessionMetadata);
        if (entity is Company)
        {
            if (DateTime.Today < new DateTime(2015, 8, 1) ||
                DateTime.Now.Ticks % 3 != 0)
            {
                return generatedShardId + "2";
            }
            return generatedShardId;
        }
        return generatedShardId;
    }
}    

What is the logic? If we have a new company, we’ll call the GenerateShardIdFor(entity) method, and for the next 3 months, we’ll create all new companies (and as a result, their invoices) in the new servers. After the 3 months have passed, we’ll still generate the new companies on the new servers at a rate of two thirds on the new servers vs. one third on the old servers.

Note that in this case, we didn’t have to modify any data whatsoever. And over time, the data will balance itself out between the servers. In my next post, we’ll deal with how we actually split an existing shard into multiple shards.

Accepting code from the community means accepting full responsibility for all time


Sometimes, we’ll reject a certain pull request from the community, not because it doesn’t meet our standards, or doesn’t do things properly. We’ll reject it because we don’t want to accept the responsibility for this.

This seems obvious, but I got a comment on my recent post saying:

If you e.g. say that you're willing to accept a new F# module within RavenDB that does scripted deploys and automation of various tasks, I bet people would jump in with enthusiasm.

I wouldn’t accept such a PR. Not because there is anything wrong with F#, or because it wouldn’t be valuable. I wouldn’t accept such a PR because none of the core team of RavenDB has great expertise in F#. Oh, we have a few guys that played with it, and would love to do some more. In fact, I’ve got a guy that is pushing hard for allowing RavenDB to run computations via F#. It is a pretty cool feature, and I’ll talk about that in detail in a future post.

But imagine the scenario outline in the comment. A F# module that does some automation, scripted deploys, etc in F#. We go over the code, we are happy with it, we have a need for this feature. And at that point, we would have to make sure that it is written in C#.

Why? What is wrong with F#?

Pretty much nothing, except that any piece of code we ship (yes, even pieces that were contributed) has support requirements. A 2 AM call for one of the support team on that component means that the person answering the phone need to: Figure out the problem, decide if this is a misuse / feature / bug, fix this by either suggesting work around or supplying a hot fix.

And if the component with the issue is written in F#, that means that you need to have all the support group (now just over 20 people) be able to understand and work with that. And to the nitpickers, yes, it doesn’t take a long while to learn a new language, but it takes a good long while before you can be effective with it, especially at 2AM. Now multiple that by 20+ people, and you might see where we are starting to have an issue.

There is also the community angle to consider, code that is written in a language more people know gets more contributors.

Now, I’ve actually have run an real live study on that. In 2005 I wrote a view engine for MonoRail (MVC framework for ASP.Net) written in Boo. (You can read all about that here). Naturally, building a DSL that is based on Boo, I wrote that in Boo. And it was mildly successful. It was a pure OSS project, with multiple contributors. And I was the sole person that would actually modify the Boo code. Boo code looks pretty much like Python, and there isn’t any FP aspect to it. I find it imminently approachable and easy to work with.

The Brail codebase had pretty much zero outside contributions. At that point, I ported the Brail codebase to C#, the post is from 2006. Note the discussion around the reasoning. After that happened, we started getting more outside contributors and people were actually willing to take a look at the code.

For fun, this is actually in a situation where anyone using Brail was actually writing Boo code. It isn’t like they weren’t aware of how it worked. They pretty much had to, because Brail was a very thin DSL over Boo code. But moving the codebase to C# significantly improved the level of involvement.

Something that I think people miss is the fact that this kind of decision has pretty much nothing to do with the language or its merits. It is all about the implications on the project as a whole.

RavenDB ShardingEnabling shards for existing database


A question came up in the mailing list, how do we enable sharding for an existing database. I’ll deal with data migration in this scenario at a later post.

The scenario is that we have a very successful application, and we start to feel the need to move the data to multiple shards. Currently all the data is sitting in the RVN1 server. We want to add RVN2 and RVN3 to the mix. For this post, we’ll assume that we have the notion of Customers and Invoices.

Previously, we access the database using a simple document store:

var documentStore = new DocumentStore
{
	Url = "http://RVN1:8080",
	DefaultDatabase = "Shop"
};

Now, we want to move to a sharded environment, so we want to write it like this. Existing data is going to stay where it is at, and new data will be sharded according to geographical location.

var shards = new Dictionary<string, IDocumentStore>
{
	{"Origin", new DocumentStore {Url = "http://RVN1:8080", DefaultDatabase = "Shop"}},//existing data
	{"ME", new DocumentStore {Url = "http://RVN2:8080", DefaultDatabase = "Shop_ME"}},
	{"US", new DocumentStore {Url = "http://RVN3:8080", DefaultDatabase = "Shop_US"}},
};

var shardStrategy = new ShardStrategy(shards)
	.ShardingOn<Customer>(c => c.Region)
	.ShardingOn<Invoice> (i => i.Customer);

var documentStore = new ShardedDocumentStore(shardStrategy).Initialize();

This wouldn’t actually work. We are going to have to do a bit more. To start with, what happens when we don’t have a 1:1 match between region and shard? That is when the translator become relevant:

.ShardingOn<Customer>(c => c.Region, region =>
{
    switch (region)
    {
        case "Middle East":
            return "ME";
        case "USA":
        case "United States":
        case "US":
            return "US";
        default:
            return "Origin";
    }
})

We basically say that we map several values into a single region. But that isn’t enough. Newly saved documents are going to have the shard prefix, so saving a new customer and invoice in the US shard will show up as:

image

But existing data doesn’t have this (created without sharding).

image

So we need to take some extra effort to let RavenDB know about them. We do this using the following two functions:

 Func<string, string> potentialShardToShardId = val =>
 {
     var start = val.IndexOf('/');
     if (start == -1)
         return val;
     var potentialShardId = val.Substring(0, start);
     if (shards.ContainsKey(potentialShardId))
         return potentialShardId;
     // this is probably an old id, let us use it.
     return "Origin";

 };
 Func<string, string> regionToShardId = region =>
 {
     switch (region)
     {
         case "Middle East":
             return "ME";
         case "USA":
         case "United States":
         case "US":
             return "US";
         default:
             return "Origin";
     }
 };

We can then register our sharding configuration so:

  var shardStrategy = new ShardStrategy(shards)
      .ShardingOn<Customer, string>(c => c.Region, potentialShardToShardId, regionToShardId)
      .ShardingOn<Invoice, string>(x => x.Customer, potentialShardToShardId, regionToShardId); 

That takes care of handling both new and old ids, and let RavenDB understand how to query things in an optimal fashion. For example, a query on all invoices for ‘customers/1’ will only hit the RVN1 server.

However, we aren’t done yet. New customers that don’t belong to the Middle East or USA will still go to the old server, and we don’t want any modification to the id there. We can tell RavenDB how to handle it like so:

var defaultModifyDocumentId = shardStrategy.ModifyDocumentId;
shardStrategy.ModifyDocumentId = (convention, shardId, documentId) =>
{
    if(shardId == "Origin")
        return documentId;

    return defaultModifyDocumentId(convention, shardId, documentId);
};

That is almost the end. There is one final issue that we need to deal with, and that is the old documents, before we used sharding, don’t have the required sharding metadata. We can fix that using a store listener. So we have:

 var documentStore = new ShardedDocumentStore(shardStrategy);
 documentStore.RegisterListener(new AddShardIdToMetadataStoreListener());
 documentStore.Initialize();

Where the listener looks like this:

 public class AddShardIdToMetadataStoreListener : IDocumentStoreListener
 {
     public bool BeforeStore(string key, object entityInstance, RavenJObject metadata, RavenJObject original)
     {
         if (metadata.ContainsKey(Constants.RavenShardId) == false)
         {
             metadata[Constants.RavenShardId] = "Origin";// the default shard id
         }
         return false;
     }

     public void AfterStore(string key, object entityInstance, RavenJObject metadata)
     {
     }
 }

And that is it. I know that there seems to be quite a lot going on in here, but it basically can be broken down to three actions that we take:

  • Modify the existing metadata to add the sharding server id via the listener.
  • Modify the document id convention so documents on the old server won’t have a designation (optional).
  • Modify the sharding configuration so we’ll understand that documents without a shard prefix actually belong to the Origin shard.

And that is pretty much it.

FUTURE POSTS

  1. Comparing developers - 6 hours from now

There are posts all the way to Jun 30, 2015

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats