In software engineering, a basic knowledge of statistics is often useful. When I was a university professor some years ago, I was always careful to explain to my students the difference between correlation and covariance. As usual, a concrete example is the best way to illustrate the idea. Both correlation and covariance are measures of how closely related two variables are. For example, suppose you have two variables, X and Y, and 4 sets of data:
X Y
—-
2 5
0 5
2 9
8 9
There are actually several different types of coefficients of correlation, but the most common is usually called Pearson’s product-moment correlation coefficient, and is usually given by the symbol r. This is what most students encounter in an introductory statistics class, usually when studying linear regression. One of many equivalent equations for r is å(Xi – Xm)(Yi – Ym) / sqrt(å(Xi – Xm)^2 * sqrt(å(Yi – Ym)^2 where Xi means each X value and Xm is the mean of all the X values. For the data above, if you compute r you get 0.6667.
The coefficient of covariance does not have a standard symbol. One of several equations for the coefficient of covariance is cov = å(XiYi) / n where n is the number of data sets. For the data above, if you compute cov you get 4.0000.
Both of these statistics are closely related. In fact there are several mathematical equations which describe the exact relationship. But when should you use which statistic? In general, the coefficient of correlation is a better choice in most situations because its value is always normalized to the range [-1.0, +1.0] but the coefficient of covariance has no upper limit.
.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2025 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2025 G2E Conference
2025 iSC West Conference