One of the core principles of software testing is that in order to determine a pass/fail result for a test case, after executing the test case you must compare the actual result or state of the system under test with an expected result/state. For example, if you are testing the + functionality of a calculator, and the test case inputs are 3.0 and 4.0 (and the expected result is 7.0) then you exercise the SUT to get an actual result and compare that actual result with the expected result. In statistics, the most common way to compare a set of observed values with a set of expected values is to use the well-known chi-square test. However, what is not so well known is that the chi-square test is actually a discrete approximation to the log likelihood test. The chi-square test was developed in the days before calculators when computing logarithms was difficult. Anyway, the point is, in software testing, if you want to compare how close a set of actual values is to a set of expected values, you should probably use the log likelihood g-test rather than the chi-square test. The g statistic is given by 2 * (sum-over-i(Oi * ln(Oi / Ei)) where Oi is an observed value and Ei is the corresponding expected value. For example, suppose you have some system which should emit the three values (4.0, 4.0, 4.0). These are the expected values. Now if the actual results are (3.0, 4.0, 6.0) then the g-statistic is 2 * [(3.0 * ln(3.0/4.0)) + (4.0 * ln(4.0/4.0)) + (6.0 * ln(6.0/4.0))] = 3.139. The closer the g-static is to 0, the closer the actual results are to the expected results; you can look up specific probabilities if necessary.
Books (By Me!)
Events (I Speak At!)
-
Recent Posts
Archives
Categories
.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
Hi,Can you elaborate in what way you have applied this statistical method in software testing?ThanksBertrand
Recently I was asked to test some data mining software that automatically places SQL data into clusters of similar data. For numeric data this is not a problem. But for categorical data such as (Red, Large), testing how well the data has been clustered into groups is not so easy. Part of the approach I used was to look at the frequencies of the clustered data. If the clustering was random (and therefore not very effective), you\’d expect an even number of SQL data column values in every cluster. By computing the g statistic for the actual clustering results compared to an even distribution (these are the implied expected results), I was able to compute a measure of quality for the system under test.So the g statistic can be used in any testing situation where the SUT generates a set of values, and you can determine a meaningful set of expected values.