The Probability that One Normal Random Variable is Greater than Another

Suppose you have a Normal (Gaussian) random variable A with mean = 2150 and standard deviation = 70. And you have a second random variable B with mean = 2000 and std = 70. What is the probability that A is truly greater than B?

The idea is that if you select an A value it will usually be about 2150 but could be as low as about 2150 – 3*70 = 1940 or a high as 2150 + 3*70 = 2360. Similarly, B will usually be near to 2000 but could be as low as 2000 – 3*70 = 1790 or as high as 2000 + 3*70 = 2210. A will be greater than B most of the time, but there’s a chance A could be less than B.

What is the probability that A > B? I wrote a little Python program to solve the problem in two ways. In the first approach, I used the NumPy random.normal() function to draw 100,000 samples and counted the number of times A > B. Using this approach I got P(A>B) = 0.9351.

In the second approach, I used the SciPy stats.norm.sf() function to get the result directly. The key math trick is to look at the distribution of A – B. As it turns out, the mean of A – B is just u_A – u_B. And the std of A – B is sqrt(s_A^2 + s_B^2). Using that approach, I got the same answer.

This was a relatively easy problem for me, because I used to teach Statistics in college so I knew about the difference between two Normal distributions, and I have a lot of experience with NumPy and SciPy so I knew there’d be helpful functions (I just had to do a little searching through the documentation).

So, what’s the point? Ultimately, I want to look at rating sports teams by using these probability techniques to generate ratings where the ratings will be those that give the highest probability of observed results (“maximum likelihood estimation”, MLE). But that’s a few blog posts down the road.



Numeric ratings are very useful. But sometimes a written, subjective evaluation of a product is more informative. From Amazon.com

This entry was posted in Machine Learning. Bookmark the permalink.