Benford’s Law is a mesmerizing behavior in numbers and statistics. Besides, it is pretty practical too. It says if you take any count statistic (like the population of cities in the world) and take the first, left-most, digit and calculate the distribution of these digits then they are distributed according to `log10(1+1/d)`

in which ` d `

is the digit.

It is pretty powerful and extremely practical. It has been mainly used in detecting fraud. For example you can use it in detecting election fraud by running it on the number of votes in each box. I sometimes use it to see if our data logging is done correctly. Turns out that if your logger skipped some data and did not write some of your data to your logs then the Benford’s law breaks and you can detect your logging problem by applying this law to your data (without even examining your logger or instrumentation logic).

**My interpretation of the Benford’s law **

Surprisingly the reason why Benford’s law works is still very much a mystery. The way I look at it is by imagining a counter and a Poisson clock. The counter starts from 1 and goes up 1 at every tick, however the Poisson clock is rigged and instead of ticking according to an exponential random variable with a fixed Lambda λ, it ticks with a random λ at every tick. When the clock ticks we take the counter value, take its first digit and reset it to 1 again.

The code that comes below implements this logic very efficiently and the result agrees with the Benford’s Law well.

However what Benford’s Law says is that you can come up with various simulation logics and get results that agree well with what I got from my Poisson process. I am now wondering what other processes can be used to generate a sequence of numbers according to the Benford’s Law?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
#include <iostream> #include <time.h> #include <iomanip> // setprecision() #include <math.h> //log using namespace std; double urand() { //returns a uniform random variable from (0,1) return ((double) rand() / (RAND_MAX)); } double randExponential(double lambda) { //Generates an exponential random variable with parameter lambda //x = -log(1-urand)/lambda if (lambda==0) throw overflow_error("Divide by zero exception"); double randnumber = urand(); return -1.00*log(1-randnumber)/lambda; } int main() { srand (time(NULL)); long digits [9] = {0,0,0,0,0,0,0,0,0}; //the number of appearances of each digit as first digit long total = pow(10,9); // total number of simulations for (long i=0; i<total; i++) { double lambda = rand(); double randexpoNumber = randExponential(lambda); int firstdigit = floor(randexpoNumber/pow(10, floor(log10(randexpoNumber)))); digits[firstdigit-1] += 1; } for (int i= 0; i<9; i++) { cout << i+1 << " -- Simulated Probability: " << setprecision(3)<< ((double) digits[i]/total) << ", Benford Law Probability: " << log10(1+1.00/(i+1)) << endl; } return 0; } |

1 2 3 4 5 6 7 8 9 |
1 -- Simulated Probability: 0.306, Benford Law Probability: 0.301 2 -- Simulated Probability: 0.171, Benford Law Probability: 0.176 3 -- Simulated Probability: 0.12, Benford Law Probability: 0.125 4 -- Simulated Probability: 0.095, Benford Law Probability: 0.0969 5 -- Simulated Probability: 0.0794, Benford Law Probability: 0.0792 6 -- Simulated Probability: 0.0681, Benford Law Probability: 0.0669 7 -- Simulated Probability: 0.0596, Benford Law Probability: 0.058 8 -- Simulated Probability: 0.0531, Benford Law Probability: 0.0512 9 -- Simulated Probability: 0.0475, Benford Law Probability: 0.0458 |