How to unit test randomized algorithms using information entropy and t-test

In the previous post we looked at how the Knuth-Fisher-Yates shuffle works and how we can test it in a very quick way. We implemented the correct version and two incorrect versions and to test them we looked at the distribution of the elements in the first cell for our list to make sure it is uniform. The output of the three functions were as below:

It is relatively easy to see that the biased shuffle 2 is very different than the other two. But how can we really differentiate between the biased shuffle 1 and the correct implementation? They both seem to return uniform outputs. Besides, how can we come up with automated unit tests that we can run and stop the incorrect implementations from being used?

Using Maximum Entropy:

One way of testing these two implementations is to use the maximum entropy principle. The information entropy is defined as H = \sum p_i \log_2 p_i . Based on this definition, a more uniform distribution will have a higher entropy value than a non uniform distribution and spiky distribution and we use this property to compare the two implementations.

In the following code we print the average entropy of the output of four different functions: a biased implementation that looks actually pretty good if we naively look at its output, a correct implementation by us, Python’s internal implementation, and finally an ideal uniform output.

The output of this code is as below:

The incorrect implementation in fact has the smallest entropy, suggesting that it is biased. Our correct implementation has a higher entropy than Python’s internal shuffle function, does that mean we are better than the internal shuffle? Besides, if we did not have the correct implementation, the entropy of the incorrect implementation would have looks close enough to the entropy of the python shuffle. In this case we can use t-statistics and p-value of a t-test.

Using t-test for unit testing:
The Welch’s t-test or the Student t-test can help us see if the output of our shuffle functions are different from each other. We run each shuffle a couple of times and then run a t-test between the outputs. If the p-value is smaller than some threshold then the outputs are different. Below is the python and scipy codes for doing that:

The output of this piece of code is as follows. As we see the p-value of the t-test between Python shuffle and our own correct shuffle is above the 0.05 threshold but when we do a t-test between our biased implementation and the correct shuffle it is statistically significant highlighting the entropy value for our biased algorithm is in fact smaller than the correct python implementation.

I have found t-test to be a very interesting way to unit test randomized functions. In this post I use the information entropy and the t-test procedure to unit test a shuffle function. However, you would need to have at least 100 correct outputs from a correct and unbiased shuffle function to be able to run your t-test otherwise your t-test will not have enough power to detect bias. Using t-test has its own drawbacks though. Every 20 or so times that you run your tests, even the correct implementation will not pass the test since p-value might randomly be below 0.05 every 20 times when there is no true difference. The following the the whole script that I used here. I increased the degrees of freedom (the number of times we run the shuffle functions) to 10000 and the p-value still was not statistical significance showing a good behavior in our case.

The absolution value of the normal distribution
|N(μ, σ2)|

Let’s start with an interesting question. If I split a normal distribution in half and find the center for each half and then subtract the left center from the center of the right half what will I get?


Surprisingly, I have run into this problem twice in my daily work and I thought it would make an interesting blog post. The first time I had to solve this problem was when I was implementing this paper. The analytical form of this solution is relatively easy to derive, but we will end up doing some simplifications.

\int_{0}^{\infty} x . \tfrac{2}{\sigma \sqrt{2 \pi }}e^{- \tfrac{(x-\mu )^2}{2\sigma ^2}} dx - \int_{- \infty}^{0} x . \tfrac{2}{\sigma \sqrt{2 \pi }}e^{- \tfrac{(x-\mu )^2}{2\sigma ^2}} dx

The 2 in the numerator for \tfrac{2}{\sigma \sqrt{2 \pi }} is there to transfrom the half-normal distribution into a proper probability distribution that adds up to one. The expression can now be simplified to

= 4 \int_{0}^{\infty} x . \tfrac{1}{\sigma \sqrt{2 \pi }}e^{- \tfrac{(x-\mu )^2}{2\sigma ^2}} dx

mu and sigma are constant parameters therefore

= 4 . \tfrac{1}{\sigma \sqrt{2 \pi }} \int_{0}^{\infty} x e^{- \tfrac{(x-\mu )^2}{2\sigma ^2}} dx

There is no analytical solution for this (you would end up with the error function). However, we can easily realize that mu is not going to change the results and the expression will have the same value for any arbitrary value of mu. We can basically transfer the two half-normal distributions along the x axis such that the final normal distribution is zero-mean. For an standard normal (mu =0 , sigma = 1), we arrive at the following expression that is easily solvable.

= 4 . \tfrac{1}{\sqrt{2 \pi }} \int_{0}^{\infty} x e^{- \tfrac{x^2}{2}} dx

This expression will result in

= 2 \sqrt{\tfrac{2}{\pi}}\left [ -e^{\tfrac{-x^2}{2}} \right ]_{0}^{\infty} = 2 \sqrt{\tfrac{2}{\pi}}

It is therefore easy to show that for any other normal distribution with sigma = σ the expression for the mean will be as the following:

= 2 \sigma \sqrt{\tfrac{2}{\pi}}

I usually evaluate my analytical solution with a little simulation code. The following code, although not optimized for performance, can show us that the analytical solution is correct.


The output of this little code is as follows. And it is sufficiently close to 1.595769, the theoretical value.

The absolute value of the normal distribution follows a distribution called the half-normal distribution. Using the same method, it is easy to show that the mean value for the half-normal distribution is

=  \sigma \sqrt{\tfrac{2}{\pi}}

How do you interpret the Benford’s Law? a practical technique for detecting instrumentation and logging problems

Benford’s Law is a mesmerizing behavior in numbers and statistics. Besides, it is pretty practical too. It says if you take any count statistic (like the population of cities in the world) and take the first, left-most, digit and calculate the distribution of these digits then they are distributed according to log10(1+1/d) in which d is the digit.

It is pretty powerful and extremely practical. It has been mainly used in detecting fraud. For example you can use it in detecting election fraud by running it on the number of votes in each box. I sometimes use it to see if our data logging is done correctly. Turns out that if your logger skipped some data and did not write some of your data to your logs then the Benford’s law breaks and you can detect your logging problem by applying this law to your data (without even examining your logger or instrumentation logic).

My interpretation of the Benford’s law
Surprisingly the reason why Benford’s law works is still very much a mystery. The way I look at it is by imagining a counter and a Poisson clock. The counter starts from 1 and goes up 1 at every tick, however the Poisson clock is rigged and instead of ticking according to an exponential random variable with a fixed Lambda λ, it ticks with a random λ at every tick. When the clock ticks we take the counter value, take its first digit and reset it to 1 again.

The code that comes below implements this logic very efficiently and the result agrees with the Benford’s Law well.

However what Benford’s Law says is that you can come up with various simulation logics and get results that agree well with what I got from my Poisson process. I am now wondering what other processes can be used to generate a sequence of numbers according to the Benford’s Law?