ANOVA Using C# in Visual Studio Magazine

I wrote an article titled “ANOVA Using C#” in the August 2022 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2022/08/17/anova-csharp.aspx.

Analysis of variance (ANOVA) is a classical statistics technique that’s used to infer if the means (averages) of three or more groups are all equal, based on samples from the groups. For example, suppose there are three different introductory computer science classes at a university. Each class is taught by the same teacher but uses a different textbook. You want to know if student performance is the same in all three classes or not.

My article walks through an example where:

Group 1: 3, 4, 6, 5
Group 2: 8, 12, 9, 11, 10, 8
Group 3: 13, 9, 11, 8, 12
Mean 1: (3 + 4 + 6 + 5) / 4 = 18 / 4 = 4.50
Mean 2: (8 + 12 + . . + 8) / 6 = 58 / 6 = 9.67
Mean 3: (13 + 9 + . . + 12) / 5 = 53 / 5 = 10.60
Overall: (3 + 4 + . . + 12) / 15 = 129 / 15 = 8.60

Group means and an overall mean are computed. The means are used to compute SSb and SSw values. The SSb and SSw values are used to compute MSb and MSw values. The MSb and MSw values are used to compute an F-statistic. The F-statistice is used to compute a calculated p-value = 0.0004.

Loosely speaking, the p-value is the likelihood that all three means are the same. Because the p-value is so small, the conclusion is that the means are not all the same. Looking at the data, it appears that the mean of Group 1 is smaller than the means of Group 2 and Group 3.

The results of an ANOVA analysis are probabilistic and should be interpreted conservatively. For real-world data, the computed p-value is only an indication of the likelihood that the source population means are all the same. For small p-values (where “small” depends on your particular problem scenario), an appropriate conclusion is something like, “the sample data suggest that it’s unlikely that the population means are all the same.” For large p-values, an appropriate conclusion is, “the sample data suggest that all k source populations likely have the same mean.”

One significant weakness of ANOVA is that it’s often impossible to test the assumptions that the data sources are Gaussian distributed and have equal variances.



Analysis of Variance is based on the variability of sample data. One of my favorite book series is the Mars series by Edgar Rice Burroughs. Here are three covers of the third book in the series, “The Warlord of Mars” (1914), that have high visual variability. Left: By artist Robert Abbett. Center: By artist Michael Whelan. Right: By artist Gino D’Achille.


This entry was posted in Machine Learning. Bookmark the permalink.