The bug in the random sort

time to read 2 min | 202 words

We needed to randomize a list of values, so I quickly wrote the following code:

What do you expect the output to be?

  • A –> 2431
  • B –> 2571
  • C -> 4998

The number vary slightly, but they are surprisingly consistent overall. C is about 50% likely to be at the head of the list, but I wanted the probability to be 33% obviously.

Can you figure out why this is the case? In order to sort, we need to compare the values. And we do that in a random fashion. We start by comparing A and B, and they have 50% change of either one of them being first.

Then we compare the winner (A | B) to C, and there is a 50% chance of C being first. In other words, because we compare C to the top once, it has a 50% chance of being first. But for A and B, they need to pass two 50% chances to get to the top, so they are only there 25% of the time.

When we figured out why this is happening, I immediately thought of the Monty Hall problem.

The right solution is the Fisher-Yates algorithm, and here is how you want to implement it.