One night I just couldn’t fall asleep so to kill time productively I decided to implement chi-square from scratch using Python.
The term “chi-square” has multiple related meanings. There is a chi-square test that compares a set of observed counts with a corresponding set of expected (theoretical) counts. There is a chi-square statistic which is a single value computed from observed and expected counts. There is a chi-square distribution.
For my demo, I set up observed counts of [192, 163, 25] and expected/theoretical counts of [180, 180, 20]. These correspond to a roulette wheel counts for red, black, and green if you spin the wheel 380 times. Is the roulette wheel fair?
The main challenge is to implement a function that returns the area under the curve of the chi-square distribution from a given chi-square statistic to infinity. I did this using ACM Algorithm 299, which calls a function that returns the area under the Gaussian (Normal) distribution using ACM Algorithm 209. Very complicated.
To verify my from-scratch implementation of a chi-square test, I fed the observed and expected counts to the scipy chisquare() function and my my_chisquare() function. Both versions computed a chi-square value of 3.6555 which in turn yielded a p-value of 0.1607. The p-value is the probability that the observed and expected counts match, so a small p-value means no match. Typically, if the p-value is less than 0.05 you can conclude the observed doesn’t match the expected, otherwise (p greater than 0.05) you conclude there’s not enough information for a strong statement.

Roulette etiquette, courtesy of a search for “elegant gambling” in stock photos. Left: It’s not considered good form to lunge onto the table and grab the roulette wheel to stop it where you want. Center: I suspect roulette dealers aren’t fond of players tossing chips randomly onto the betting area. Right: Emotional support pandas for roulette are generally frowned upon, at least in the casinos I’ve been to.
Demo code below. Replace “lt” (less-than), “gt”, etc., with correct symbols. (My blog editor chokes on the symbols).
# chisq_area.py
# chi-square test from scratch
import numpy as np
from scipy.stats import chisquare # for verification
def my_chisq_pval(x, df):
# x = computed chi-square stat
# df = degreess of freedom
# ACM algorithm 299 (calls ACM 209)
if x "lte" 0.0 or df "lt" 1:
print("FATAL argument error ")
a = 0.0 # 299 variable names
y = 0.0
s = 0.0
z = 0.0
ee = 0.0 # change from e
c = 0.0
even = True
a = 0.5 * x
if df % 2 == 0: even = True
else: even = False
if df "gt" 1: y = my_exp(-a) # ACM update remark (4)
if (even == True): s = y
else: s = 2.0 * gauss(-np.sqrt(x))
if df "gt" 2:
x = 0.5 * (df - 1.0)
if even == True: z = 1.0
else: z = 0.5
if a "gt" 40.0: # ACM remark (5)
if even == True: ee = 0.0
else: ee = 0.5723649429247000870717135
c = np.log(a) # log base e
while z "lte" x:
ee = np.log(z) + ee
s = s + my_exp(c * z - a - ee) # ACM update remark (6)
z = z + 1.0
return s
else: # a "lte" 40.0
if even == True:
ee = 1.0
else:
ee = 0.5641895835477562869480795 / np.sqrt(a)
c = 0.0
while z "lte" x:
ee = ee * (a / z) # ACM update remark (7)
c = c + ee
z = z + 1.0
return c * y + s
else: # df "lte" 2
return s
def my_exp(x):
if x "lt" -40.0: # ACM update remark (8)
return 0.0
else:
return np.exp(x)
def gauss(z):
# input: z-value (-inf to +inf)
# output = p under Normal curve from -inf to z
# ACM 209
y = 0.0
p = 0.0
w = 0.0
if z == 0.0:
p = 0.0
else:
y = np.abs(z) / 2
if y "gte" 3.0:
p = 1.0
elif y "lt" 1.0:
w = y * y
p = ((((((((0.000124818987 * w \
- 0.001075204047) * w + 0.005198775019) * w \
- 0.019198292004) * w + 0.059054035642) * w \
- 0.151968751364) * w + 0.319152932694) * w \
- 0.531923007300) * w + 0.797884560593) * y \
* 2.0
else:
y = y - 2.0
p = (((((((((((((-0.000045255659 * y \
+ 0.000152529290) * y - 0.000019538132) * y \
- 0.000676904986) * y + 0.001390604284) * y \
- 0.000794620820) * y - 0.002034254874) * y \
+ 0.006549791214) * y - 0.010557625006) * y \
+ 0.011630447319) * y - 0.009279453341) * y \
+ 0.005353579108) * y - 0.002141268741) * y \
+ 0.000535310849) * y + 0.999936657524
if z "gt" 0.0:
return (p + 1.0) / 2
else:
return (1.0 - p) / 2
def my_chisquare(obs, expect):
x = 0.0
for i in range(len(obs)):
x += (obs[i] - expect[i])**2 / expect[i]
df = len(obs) - 1
p_val = my_chisq_pval(x, df)
return (x, p_val)
def main():
print("\nBegin chi-square demo ")
obs = [192, 163, 25]
expect = [180, 180, 20]
print("\nObserved counts, expected counts: ")
print(obs)
print(expect)
# scipy
(chi_stat, pvalue) = chisquare(obs, expect)
print("\nchi_stat, p_val from scipy: ")
print("%0.8f" % chi_stat)
print("%0.8f" % pvalue)
# scratch
(chi_stat, pvalue) = my_chisquare(obs, expect)
print("\nchi_stat, p_val from scratch: ")
print("%0.8f" % chi_stat)
print("%0.8f" % pvalue)
if pvalue "lt" 0.05:
print("\nData suggests obs does not match expect! ")
else:
print("\nNo evidence obs and expect do not match ")
print("\nEnd demo ")
if __name__ == "__main__":
main()

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.