The bug in the random sort

Mar 05 2020

The bug in the random sort

time to read 2 min | 202 words

We needed to randomize a list of values, so I quickly wrote the following code:

What do you expect the output to be?

A –> 2431
B –> 2571
C -> 4998

The number vary slightly, but they are surprisingly consistent overall. C is about 50% likely to be at the head of the list, but I wanted the probability to be 33% obviously.

Can you figure out why this is the case? In order to sort, we need to compare the values. And we do that in a random fashion. We start by comparing A and B, and they have 50% change of either one of them being first.

Then we compare the winner (A | B) to C, and there is a 50% chance of C being first. In other words, because we compare C to the top once, it has a 50% chance of being first. But for A and B, they need to pass two 50% chances to get to the top, so they are only there 25% of the time.

When we figured out why this is happening, I immediately thought of the Monty Hall problem.

The right solution is the Fisher-Yates algorithm, and here is how you want to implement it.

Tweet Share Share 6 comments

Tags:

Comments

05 Mar 2020
13:02 PM

Dennis

List.Sort also requires stable ordering, if once you have said a>b it must always return that. Linqs OrderBy(x=>random.Next()) will work however. Fisher-Yates is useful if you just want n random elements from a list that is much larger than n elements.

05 Mar 2020
13:25 PM

svick

I realize this is quick-and-dirty and broken code, but I still think there are few things worth pointing out about it:

Repeatedly calling new Random() will work fine on .Net Core 2.0+, but not on older frameworks (including .Net Framework).
Providing a non-deterministic delegate to List<T>.Sort() can cause all sorts of problems and does not guarantee a reasonable result.
Fisher-Yates is the optimal solution, but assigning a random number to each value and sorting by that will also work and is easier to implement and understand: list.OrderBy(_ => new Random().Next()). (And non-deterministic delegate is okay in this case, because Enumerable.OrderBy caches the keys, while List<T>.Sort does not cache the results of the comparison.)

05 Mar 2020
14:28 PM

Pop Catalin

The reverse for for Fisher-Yates, you picked as an example, is such a weird implementation. Works perfectly fine with an increasing for:

public static void Shuffle<T>(List<T> list)
{
    var rnd = new Random();
    for (int i = 0; i < list.Count; i++)
    {
        var ri = rnd.Next(i, list.Count);
        T val = list[i];
        list[i] = list[ri];
        list[ri] = val;
    }
}

05 Mar 2020
15:41 PM

Oren Eini

Dennis,

OrderBy is streaming, so it needs to keep track of all items. It basically create a separate list of indexes and run based on that.

see:https://github.com/dotnet/runtime/blob/master/src/libraries/System.Linq/src/System/Linq/OrderedEnumerable.cs#L28

06 Mar 2020
13:04 PM

Olek

The stack overflow example you are pointing to is quite badly broken. It used to be correct implementation, but as of the last edit on Nov 18, 2019 it's completely broken. What's especially funny, "Try this code" link contains test case which does a nice job to hide the problem with the code.

In the linked code, after 1 shuffle last element of the array will be either N with probability of (N-1)/N) or 1 (1/N)

11 Mar 2020
14:10 PM

Budulis

list = list.OrderBy(_ => Guid.NewGuid()).ToList()

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB