The story of two lovers and the beauty of statistics

[The original title for this post was “The story of two lovers, joint densities and the beauty of statistics”. I dropped the “joint densities” from the title]

Recently a friend of mine gave me a puzzle. The puzzle is as follows:

Many years ago there were two people, secretly in love, and were traveling with a group of friends who did not know they are in love. The large group of friends were going around Europe by bus. Stopping in major cities and staying in hotels. In the long hallway of hotels, every person would get a separate room. Every night one of these lovers would sneak out of their room and go to the other’s room for an innocent cuddling. The puzzle is: if the long hallway is L meters long and rooms are uniformly spread along the hallway. What would be the average length that one of them needs to tiptoe to reach the other one.

Depending on how you want to solve this problem this can be an easy or a relatively hard problem. We will solve it two ways. First we can solve it with simulation and next we can validate the simulation with an analytical solution.

Let’s first simplify the problem. Assume the location of one of them is X \sim U(0,L) and the other is located at Y \sim U(0,L). We want to find the average value for $latex Z= |X-Y|

And the histogram of Z values is shown below. Z is following a triangular distribution.
z values

If R is your favorite language then you might find the rgl and akima libraries pretty useful here. You can produce visualizations of Z values (vs X and Y) using R.

Z valuesZ vs (X, Y)

This problem can be solved in 6 lines of python or R (perhaps two line if you don’t mind a compressed notation). But can how can we find a closed form solution for this?

Analytical solution
We have X \sim U(0,L) and Y \sim U(0,L). We want to find the average value for E[Z]= E[|X-Y|]. Additionally we know that Y and X are independent X \perp Y. The probability density function for X and Y are f_X(x) = 1/L and f_Y(y) = 1/L. Since the two variables are independent then the expectation of the joint probability is in the form of:

E[Z] = \int_0^L \int_0^L |x-y|f_X(x)f_Y(y)dy dx

By replacing f_X(x) = 1/L and f_Y(y) = 1/L we have:

E[Z] = \int_0^L \int_0^L |x-y| \frac{1}{L^2} dy dx

We can now remove the absolute value by conditioning on the value of X-Y. Meaning, if X-Y>0 then |X-Y| = X-Y otherwise |X-Y| = Y-X.

E[Z] = \frac{1}{L^2} \int_0^L \int_0^x (x-y) dy dx + \int_0^L \int_x^L (y-x) dy dx


E[Z] = \frac{1}{L^2} \int_0^L (xy-1/2y^2)_{y=0}^{y=x} dx + \int_0^L (1/2y^2-xy)_{y=x}^{y=L} dx

This is equal to \frac{1}{3} L. So the two lovers would, on average, walk one third of the hallway to see each other.

Also see:
Average Distance Between Random Points on a Line

Python Tips: Converting a string to a list of characters

Sometimes you may need to convert a string to a list of characters (for example you may want to calculate a hash value of the string using the folding method). If you want to break the string on spaces then you can simply use mystring.split() . For splitting the string into individual characters you can simply use list(mystring) .

However, in practice we almost never build our own hash function. Python has a default hash() function. For cryptography you might use the Hashlib module though.

Interesting questions: Anagram Numbers

I found this old question in the archives of the British informatics olympiads

Question 1: Anagram Numbers
An anagram number is a number that can be multiplied by at least one single digit number (other than 1) to become an anagram of itself. Any anagrams formed by multiplying an anagram number by a digit are said to be generated by that anagram number. Two numbers are anagrams of each other if they can both be formed by rearranging the same combination of digits.

For example:

  • 1246878 is an anagram number; multiplying by 6 generates 7481268 or by 7 generates 8728146. These numbers all contain a single 1, 2, 4, 6, 7 and two 8s.
  • 1246879 is not an anagram number.

1(a) [ 25 marks ]

Write a program which reads in a single number (between 1 and 123456789 inclusive) and determines if it is an anagram number. If the number is not an anagram number you should output the word NO. If it is an anagram number you should output each single digit it can be multiplied by to make an anagram of itself.

  • Sample run 1
    2 4 5 7 8
  • Sample run 2

1(b) [ 2 marks ]

85247910 is generated by which anagram numbers?
1(c) [ 3 marks ]
How many anagram numbers between 100,000 and 999,999 contain no duplicated digits?

Here is a quick solution to this program. I am wondering if there is any solution to this that is more efficient?


Python Basics – A quick python review

I have been working with Python since 2007 on various projects. I have mostly used it on prototyping projects, some web development and some data analysis work. Python has changed a lot. However I still need to sometimes go and review the basics. Here are some useful notes that I have taken over many years. In case you also like to remind yourself of some of the basic operations and techniques in Python these might come in handy. I am also interested to know some of the simple Python principles and techniques that you find useful

Python Variables

For deleting a variable you can use delete

Python style guide uses underscores for specific purposes _ is used for a throw away variable or for referring to the previous results in the interactive Python. Leading __var__ are usually used for reserved variables. Leading single underscore is usually used for internal variables _var.


Keys in Dictionaries
Python has a very easy to use dictionary data structure. Dictionaries are highly efficient that provide O(1) insertion and read operations (as opposed to O(n) in a list). However, it is very easy to accidentally degrade your dictionary to a list and dramatically impact its efficiency. Let’s assume you like to check to know whether your dictionary contains a key. There are two ways to do that. The first method is checking key in dict and the second one is key in dict.keys(). Both will return the correct value. However, the second method is dramatically slower than the first since dict.keys() is in fact a list and not a dictionary anymore

IF Statements 

Unlike other programing languages there is no multi branching in Python (no “case” and “switch” statements). Python also uses “elif” for else if

Various Assignment Operations

Zip and Map

Zip and Map are two of the most useful functions in Python. Zip allows you to make tuples of data from lists. Map allows you to apply functions to tuples.

You can define your own function by using lambda. Below is the same sum function

Python Classes

In Python the __init__ method is called automatically at the instantiation stage. Every time Python instantiates an object it will call this method.

List Comprehensions in Python
List comprehensions are one of the most elegant language constructs in Python. Let’s assume we want to produce a list of all numbers between 0 and 100 that are divisible by both 5 and 3. This can be done by the following code

List Slicing in Python
Python has excellent syntaxt for working with its lists. Lists in Python are all zero-indexed so the first element of a list a = [1,2,3] can be found by a[0]. You can also take a range of elements, for example if you like to take the second element all the way to the end you would use a[1:]. Interestingly if you like to take the second element until the end but excluding the last element you would use a[1:-1]. List slicing notation also accepts steps, for example if you like to take the former list but slip every other element, the you would use a[1:-1:2]. You can also use negative steps. See this link for more.

The following code sample allows you to generate bigrams for a sentence

Object Serialization in Python
Python has two modules for serialization cPickle and Marshal. You can compress each serialized object by using the gzip module.

Raising an Exception in Python
If you like to manually raise an arbitrary exception then you can do the following