Knuth-Fisher-Yates Shuffling

Random numbers are hard to work with and they are harder to debug. In this post I will demonstrate the testing of a shuffle function in Python. We first implement an unbiased version of Knuth-Fisher-Yates shuffle and then implement two biased ones. I will then use the functional properties of python to use the same tester for all three .

What is the Knuth-Fisher-Yates shuffling? This shuffling is the correct method for shuffling a list or array. In this method you have a list and you want to shuffle it such that each element is equally likely to appear in any cell. Here is how it works, you have an iterator that goes through the elements of the list one by one, every time you pick an element you generate a random number that is between the position of that element and the position of the last element (inclusive), then you swap the two elements. It is easy to see that each element is equally likely to be appearing in any position in the final list. Below is the correct implementation of the Knuth shuffle in Python.

Allowing the element to be able to be swapped by itself is important. So is not allowing it to be swapped by anything before it. It is, in fact, relatively easy to come up with biased implementations of this function. I have implemented two methods that might look correct at the first look but they are biased and should be avoided. The first implementation takes the element but swaps it with any arbitrary element in the list. The second biased implementation does not allow the element to be swapped by itself.

How can we see this bias and how can we test these functions?
Testing functions that use random number generators are hard. One way of testing these three functions is to look at the probability of each element appearing as the first element of the shuffled list. Ideally we expect the elements to be uniformly distributed in the shuffled list. The tester uses the functional nature of python and here we pass a function to our functions to test.

Output of this shows that the probabilities of any element appearing in the first position is only close to each other in the correct implementation and any of the biased implementations shows a nonuniform distribution.

The complete code is below:

See the follow up post here

One thought on “Knuth-Fisher-Yates Shuffling

  1. Pingback: How to unit test randomized algorithms using information entropy and t-test – data science tools and techniques

Leave a Reply

Your email address will not be published. Required fields are marked *