How do you interpret the Benford’s Law? a practical technique for detecting instrumentation and logging problems

Benford’s Law is a mesmerizing behavior in numbers and statistics. Besides, it is pretty practical too. It says if you take any count statistic (like the population of cities in the world) and take the first, left-most, digit and calculate the distribution of these digits then they are distributed according to log10(1+1/d) in which d is the digit.

It is pretty powerful and extremely practical. It has been mainly used in detecting fraud. For example you can use it in detecting election fraud by running it on the number of votes in each box. I sometimes use it to see if our data logging is done correctly. Turns out that if your logger skipped some data and did not write some of your data to your logs then the Benford’s law breaks and you can detect your logging problem by applying this law to your data (without even examining your logger or instrumentation logic).

My interpretation of the Benford’s law
Surprisingly the reason why Benford’s law works is still very much a mystery. The way I look at it is by imagining a counter and a Poisson clock. The counter starts from 1 and goes up 1 at every tick, however the Poisson clock is rigged and instead of ticking according to an exponential random variable with a fixed Lambda λ, it ticks with a random λ at every tick. When the clock ticks we take the counter value, take its first digit and reset it to 1 again.

The code that comes below implements this logic very efficiently and the result agrees with the Benford’s Law well.

However what Benford’s Law says is that you can come up with various simulation logics and get results that agree well with what I got from my Poisson process. I am now wondering what other processes can be used to generate a sequence of numbers according to the Benford’s Law?

Leave a Reply

Your email address will not be published. Required fields are marked *