Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,546
|
Comments: 51,161
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 488 words

Recently I had to try to explain to a non technical person how I rate the developers that I work with. In technical terms, it is easy to do:

int Compare(devA, devB, ctx)

But it is very hard to do:

int Compare(devA, devB);

var score = Evaluate(dev);

What do I mean by that? I mean that it is pretty hard (at least for me), to give an objective measure of a developer with the absence of anyone to compare him to, but it very easy to compare two developers, but even so, only in a given context.

An objective evaluation of a developer is pretty hard, because there isn’t much that you can objectively measure. I’m sure that no reader of mine would suggest doing something like measuring lines of code, although I wish it was as easy as that.

How do you measure the effectiveness of a developer?

Well, to start with, you need to figure out the area in which you are measuring them. Trying to evaluate yours truly on his HTML5 dev skills would be… a negative experience. But in their areas of expertise, measuring the effectiveness of two people is much easier. I know that if I give a particular task to Joe, he will get it done on his own. But if I give it to Mark, it will require some guidance, but finish it much more quickly.  And Scott is great at finding the root cause of a problem, but is prune to analysis paralysis unless prodded.

This came up when I tried to explain why a person spending 2 weeks on a particular problem was a reasonable thing, and that in many cases you need a… spark of inspiration for certain things to just happen.

All measurement techniques that I’m familiar with is subject to the observer effect, which means that you might get a pretty big and nasty surprise by people adapting their behavior to match the required observations.

The problem is that most of the time, development is about things like stepping one foot after the other, getting things progressively better by making numerous minor changes that has major effect. And then you have a need for brilliance. A customer with a production problem that require someone to have the entire system in their head all at once to figure out. A way to optimize a particular approach, etc.

And the nasty part is that there is very little way to actually get those sparks on inspiration. But there is usually a correlation between certain people and the number of sparks of inspiration per time period they get. And one person’s spark can lead another to the right path and then you have an avalanche of good ideas.

But I’ll talk about the results of this in another post Smile.

time to read 7 min | 1272 words

A law recently came into effect in Israel that mandated all supermarket chains to publicize their full pricing data and keep it up to date. The idea is that doing so will allow users to easily compare prices and lower costs. That has led to some interesting complaints in the media.

“Evil corporations are publishing the data in an unreadable format instead of an accessible files”.

I was really amused by the unreadable format (pretty horrible XML, to tell you the truth) and the accessible files (Excel spreadsheets) definitions. But expecting the media to actually get something right is… flawed, probably.

At any rate, one of the guys in the office played around with the data, toying with the notion of creating a small app for that purpose. (Side node, there are currently at least three major efforts to have a price comparison app/site for this. One of them is by a client of us, but beside for some modeling advice, we have no relation to this).

So we sat down and talked about the kind of features that you will want from this kind of app. The first option, of course, is to define a cart of the products I want to buy, and then compare this cart on various chains and branches. (For each chain, there is a different price list, and prices are also different on a per branch basis).

So far, that is easy, but that isn’t really what you want to do. The next obvious steps would be to take into account sales and promotions. The easiest ones are sales such as 1+1 or 2+1.That require an optimizing engine that can suggest getting another bottle of coke to the two you already got (same price), or recommending the purchase of 10 packages of diapers (non perishable product, currently on significant sale).

But this is really just the start. Sales and promotions are complex, a typical condition for a sale would be a whole chicken for 1 NIS (if you purchase more than 150 NIS total). Or maybe you get complementary products, or… you get the point. You need to take into account all the various discounts you can get.

Speaking of discounts, that means that you now need to also consider not just static stuff, but also dynamic (per user). Maybe the user has a set of coupons that they can use, or maybe they have a club membership in a particular chain that has certain benefits (for example 3% discount on all the chain’s branded products and another 5% for any 5 other products preselected at the time you joined the club).

So you now need a recommendation engine that can run those kind of projections and make suggestions based on them.

Oh, and wait! We also need to consider substitutions. If I purchased Bamba (a common peanut butter snack in Israel, and one of the only common uses of peanut butter in Israel), which is shown on the left, I might actually want to get Shush, which is pretty much the same thing, only slightly less costly. Or maybe get Parpar, which is another replacement, which is even cheaper.

To make it easier to the people not well versed with Israeli snacks, that means that we want to offer RC Cola or Pepsi Cola instead of Coca Cola, because they are cheaper.

Of course, some people swear that the taste is totally different, so they can’t be replaced. But does anyone really care what brand of cleaning product they use to clean the floor? The answer, by the way, is yes, people really do care about that kind of stuff. But even when you have a specific brand, maybe you can get the 2 gallon bottle of cleaning solution instead of the 1 gallon, because it end up being cheaper.

This is just to outline the complexity inherit in this. To do this well, you need to do quite a lot. This is actually really interesting from a technical perspective.  But this isn’t a post about the technical side.

Instead, let us talk about money. In particular, how to make money here. You can assume that selling the app wouldn’t work, not even a freemium model. But an app that guide people on what store to buy? There are quite a lot of ways to make money here.

We’ll start with an interesting observation. If you are a gateway to Joe making money, that usually mean that that Joe is going to pay you. Something that is very obvious is club membership referral fees.

If I run your cart through my engine, and I can tell you “Dude, if you join chain Xyz club you can save 100NIS today and 15,000 NIS this year, just click on this button to join”, that is a pretty compelling argument. And the chain would pay a certain fee for everyone we registered for them. Another way to make money is to get the aggregated data and do stuff with it. Coming up with “chain Xyz is the cheapest” so they can use it on their ads is something that would be worth money.

Another way to do this is to run interference. Instead of going to the supermarket, we’ll just make that order for you, and it will be on your door in a few hours… and the app will make sure to always get it from the cheapest location.

There are also holidays, and they typically have big purchases for the dinners, etc. That means that we can build a cart for the holidays, check it with the various chains, then suggest buying them at chain Xyz. Of course, that was pre-negotiated so they would get the big sales (presumably we are ethical and make sure that that chain is really the cheapest).

Of course, all of this make an interesting assumption. That the chains care about the price you buy from them. They don’t really care about that, they care about their margins. That means that we can play on that. We can suggest a cheaper cart overall, but one that would be more profitable to the chain because the products that we suggest have lower price for the customer, but higher margin for the chain.

And we totally left out the notion of coupons and discounts for actually using the app. “Scan this screen to get a 7% discount for Coca Cola”, for example. This way, the people paying us aren’t just the chains, but the manufacturers as well.

And coming back to the technical side of things. Consider targeted discounts. If I know what kind of products you want to buy, we can get all the way to customer specific price listing. (I’ll sell you diapers at cost, and you’ll buy the store brand of toilet paper at the list price, instead of the usually discounted price, so the store profit is higher).

Nitpicker corner: I have zero knowledge of retail beyond that is need to actually purchase food. I have no idea if those options are valid, and this is purely a mental exercise. This is interesting because of that. There are some technical problems here (in the recommendations), but a lot more business problems (in the kind of partnerships and deals that you can make for the customers).

I didn’t spend a lot of time considering this, and I’m pretty sure that I missed quite a few options. But it is a good way to remember that most business problems have depth behind them, not just technical solutions.

time to read 2 min | 391 words

I’m looking over of a bunch of distributed algorithm discussion groups, and I recently saw several people making the same bad assumption. The issue is that in a distributed system, you have to assume that any communication between system can fail.

Because that is taken into account in any distributed algorithm, there is a school of thought that believe that errors shouldn’t generate replies. That is horrifying to me.

Let me give a concrete example. In the Raft algorithm, nodes will participate in an election in order to decide who is the leader. A node can decide to vote for a certain candidate, to reject a candidate or it may be down and not responsive. Since we have to handle the non responsive node anyway, it is easy to assume that we only need to reply to the candidate when we actually vote for it. After all, no reply is a negative reply already, no?

The issue with this design decision is that this is indeed correct, but it is also boneheaded*. There are two reasons here. The minor one is that a non reply will force us to wait until a pre-configured timeout happen, after which we can go into failure handling. But actually sending a reply when we know that we refuse to vote for a node can give that node more information, and cut down the time it takes for the node to respond to negative replies.

As important as that is, this isn’t really my main concern. My main concern here is that not sending a reply leaves the administrator trying to figure out what is going on with essentially zero data. On the other hand, if the node send a “you are missing X,Y and Z for me to consider you applicable”, that is something that can be traced, that can be shown and acted upon.

It may seem like a small thing, overall, but it is something with crucial importance for operations. Those are hard enough when you have a single node. When you have a distributed system, you have to plan for that explicitly.

* I am using this terminology intentionally. Anyone who don’t consider production support and monitoring for their software from the get go never had to support complex production systems, where every nugget of information can be crucial.

FUTURE POSTS

  1. Partial writes, IO_Uring and safety - about one day from now
  2. Configuration values & Escape hatches - 5 days from now
  3. What happens when a sparse file allocation fails? - 7 days from now
  4. NTFS has an emergency stash of disk space - 9 days from now
  5. Challenge: Giving file system developer ulcer - 12 days from now

And 4 more posts are pending...

There are posts all the way to Feb 17, 2025

RECENT SERIES

  1. Challenge (77):
    20 Jan 2025 - What does this code do?
  2. Answer (13):
    22 Jan 2025 - What does this code do?
  3. Production post-mortem (2):
    17 Jan 2025 - Inspecting ourselves to death
  4. Performance discovery (2):
    10 Jan 2025 - IOPS vs. IOPS
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}