A regression model is one that predicts a single numeric value, such as the income of a person based on their sex, age, State of residence, and political leaning.
In many scenarios, a simple accuracy metric of the trained model computed on the training and test data is good enough. For example:
Evaluating model accuracy (within 0.07) accuracy on train data = 0.9350 accuracy on test data = 0.7250
But in some scenarios, it’s better to compute the accuracy of the trained model for various intervals of the dependent/target variable. For example:
Accuracy on training data (within 0.07 of true):
from to correct wrong count accuracy
0.00 25000.00 4 0 4 1.0000
25000.00 50000.00 74 10 84 0.8810
50000.00 75000.00 99 3 102 0.9706
75000.00 100000.00 10 0 10 1.0000
Accuracy on test data (within 0.07 of true):
from to correct wrong count accuracy
0.00 25000.00 1 0 1 1.0000
25000.00 50000.00 10 6 16 0.6250
50000.00 75000.00 17 5 22 0.7727
75000.00 100000.00 1 0 1 1.0000
I put together a demo using the Python API version of the LightGBM (lightweight gradient boosting machine) library.
I used a synthetic dataset. The raw data looks like:
# people_raw.txt # sex, age, state, income, politics # F 24 michigan 29500.00 liberal M 39 oklahoma 51200.00 moderate F 63 nebraska 75800.00 conservative M 36 michigan 44500.00 moderate F 27 nebraska 28600.00 liberal . . .
The raw data is encoded and looks like:
1, 24, 0, 29500.00, 2 0, 39, 2, 51200.00, 1 1, 63, 1, 75800.00, 0 0, 36, 0, 44500.00, 1 1, 27, 1, 28600.00, 2 . . .
Sex is encoded as M = 0, F = 1. Because LightGBM is a tree-based system, it’s not necessary to normalize numeric predictor variables such as age or numeric target/dependent variables like income. State is ordinal encoded as Michigan = 0, Nebraska = 1, Oklahoma = 2. Political leaning is ordinal encoded as conservative = 0, moderate = 1, liberal = 2.
There are 200 training items and 40 test items.
The LightGBM regression model is created and trained like so:
import numpy as np
import lightgbm as lgbm
print("Creating and training LightGBM regression model ")
params = {
'objective': 'regression', # not needed
'boosting_type': 'gbdt', # default
'num_leaves': 31, # default
'learning_rate': 0.05, # default = 0.10
'feature_fraction': 1.0, # default
'min_data_in_leaf': 2, # default = 20
'random_state': 0,
'verbosity': -1
}
model = lgbm.LGBMRegressor(**params)
model.fit(x_train, y_train)
print("Done ")
When computing accuracy for a regression model, you need to specify how close a prediction is to the true target value, in order for the prediction to be considered correct. For my demo, I used 7% — if the predicted income is plus or minus 7% of the true value. The percentage to use will vary greatly according to the specific problem being examined.
My accuracy-by-interval function is named accuracy_matrix() and is:
def accuracy_matrix(model, data_x, data_y,
pct_close, points):
n_intervals = len(points) - 1
result = np.zeros((n_intervals,2), dtype=np.int64)
# n_corrects in col [0], n_wrongs in col [1]
for i in range(len(data_x)):
x = data_x[i].reshape(1, -1)
y = data_y[i] # true income
pred = model.predict(x) # predicted income []
interval = 0
for i in range(n_intervals):
if y "gte" points[i] and y "lt" points[i+1]:
interval = i; break
if np.abs(pred[0] - y) "lt" np.abs(pct_close * y):
result[interval][0] += 1
else:
result[interval][1] += 1
return result
I implemented a separate show_acc_matrix() function to display a computed accuracy matrix:
def show_acc_matrix(am, points):
h = "from to correct wrong count accuracy"
print(" " + h)
for i in range(len(am)):
print("%10.2f" % points[i], end="")
print("%10.2f" % points[i+1], end="")
print("%8d" % am[i][0], end ="")
print("%8d" % am[i][1], end ="")
count = am[i][0] + am[i][1]
print("%8d" % count, end="")
if count == 0:
acc = 0.0
else:
acc = am[i][0] / count
print("%12.4f" % acc)
Calling the functions looks like:
inc_pts = \
[0.00, 25000.00, 50000.00, 75000.00, 100000.00]
am_train = \
accuracy_matrix(model, x_train, y_train, 0.07, inc_pts)
print("\nAccuracy on training data (within 0.07 of true):")
show_acc_matrix(am_train, inc_pts)
am_test = \
accuracy_matrix(model, x_test, y_test, 0.07, inc_pts)
print("\nAccuracy on test data (within 0.07 of true):")
show_acc_matrix(am_test, inc_pts)
The inc_pts list means compute accuracy for five intervals of target incomes: between 0 and $25,000, $25,000 to $50,000, $50,000 to $75,000, $75,000 to $100,000.
The results, shown above, suggest that the model is quite accurate for high income values, but not as accurate for low income values.

In real life, predicting a person’s income is very difficult. Research shows that one moderately strong correlate is a good sense of humor (which maps to high IQ and resilience), paradoxically combined with a certain amount of selfishness. Here are three movies that aren’t well-regarded by critics but I kind of like.
Left: In “The Beverly Hillbillies” (1993), the Clampett family finds oil on their land. They move to Beverly Hills and humor ensues as a con artist tries, but fails, to steal the Clampett’s new-found riches. I give this movie a B grade.
Center: In “Overboard” (1987), a spoiled rich woman falls off her yacht and loses her memory. Poor handyman Dean finds her and convinces her that she’s his wife. Of course, in the end they fall in love. I give this movie a B grade.
Right: In “Trading Places” (1983), rich Louis Winthorpe is scammed out of his money by the crooked Duke brothers. After hitting rock bottom, Louis orchestrates a comeback scam with the help of lady-of-the-evening Ophelia, street con man Billy Ray, and Louis’ old butler. I give this movie a B+ grade.
Demo code. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols (my blog editor chokes on symbols).
# people_income_lgbm.py
# predict income from sex, age, State, politics
import numpy as np
import lightgbm as lgbm
# -----------------------------------------------------------
def accuracy(model, data_x, data_y, pct_close):
n = len(data_x)
n_correct = 0; n_wrong = 0
for i in range(n):
x = data_x[i].reshape(1, -1)
y = data_y[i] # true income
pred = model.predict(x) # predicted income []
if np.abs(pred[0] - y) "lt" np.abs(pct_close * y):
n_correct += 1
else:
n_wrong += 1
return (n_correct * 1.0) / (n_correct + n_wrong)
# -----------------------------------------------------------
def accuracy_matrix(model, data_x, data_y,
pct_close, points):
n_intervals = len(points) - 1
result = np.zeros((n_intervals,2), dtype=np.int64)
# n_corrects in col [0], n_wrongs in col [1]
for i in range(len(data_x)):
x = data_x[i].reshape(1, -1)
y = data_y[i] # true income
pred = model.predict(x) # predicted income []
interval = 0
for i in range(n_intervals):
if y "gte" points[i] and y "lt" points[i+1]:
interval = i; break
if np.abs(pred[0] - y) "lt" np.abs(pct_close * y):
result[interval][0] += 1
else:
result[interval][1] += 1
return result
# -----------------------------------------------------------
def show_acc_matrix(am, points):
h = "from to correct wrong count accuracy"
print(" " + h)
for i in range(len(am)):
print("%10.2f" % points[i], end="")
print("%10.2f" % points[i+1], end="")
print("%8d" % am[i][0], end ="")
print("%8d" % am[i][1], end ="")
count = am[i][0] + am[i][1]
print("%8d" % count, end="")
if count == 0:
acc = 0.0
else:
acc = am[i][0] / count
print("%12.4f" % acc)
# -----------------------------------------------------------
def main():
# 0. get started
print("\nBegin People predict income using LightGBM ")
print("Predict income from sex, age, State, politics ")
np.random.seed(1)
# 1. load data
# sex, age, State, income, politics
# 0 1 2 3 4
print("\nLoading train and test data ")
train_file = ".\\Data\\people_train.txt"
test_file = ".\\Data\\people_test.txt"
x_train = np.loadtxt(train_file, usecols=[0,1,2,4],
delimiter=",", comments="#", dtype=np.float64)
y_train = np.loadtxt(train_file, usecols=3,
delimiter=",", comments="#", dtype=np.float64)
x_test = np.loadtxt(test_file, usecols=[0,1,2,4],
delimiter=",", comments="#", dtype=np.float64)
y_test = np.loadtxt(test_file, usecols=3,
delimiter=",", comments="#", dtype=np.float64)
np.set_printoptions(precision=0, suppress=True)
print("\nFirst few train data: ")
for i in range(3):
print(x_train[i], end="")
print(" | " + str(y_train[i]))
print(". . . ")
# 2. create and train model
print("\nCreating and training LightGBM regression model ")
params = {
'objective': 'regression', # not needed
'boosting_type': 'gbdt', # default
'num_leaves': 31, # default
'learning_rate': 0.05, # default = 0.10
'feature_fraction': 1.0, # default
'min_data_in_leaf': 2, # default = 20
'random_state': 0,
'verbosity': -1
}
model = lgbm.LGBMRegressor(**params) # scikit API
model.fit(x_train, y_train)
print("Done ")
# 3. evaluate model
print("\nEvaluating model accuracy (within 0.07) ")
acc_train = accuracy(model, x_train, y_train, 0.07)
print("accuracy on train data = %0.4f " % acc_train)
acc_test = accuracy(model, x_test, y_test, 0.07)
print("accuracy on test data = %0.4f " % acc_test)
inc_pts = \
[0.00, 25000.00, 50000.00, 75000.00, 100000.00]
am_train = \
accuracy_matrix(model, x_train, y_train, 0.07, inc_pts)
print("\nAccuracy on training data (within 0.07 of true):")
show_acc_matrix(am_train, inc_pts)
am_test = \
accuracy_matrix(model, x_test, y_test, 0.07, inc_pts)
print("\nAccuracy on test data (within 0.07 of true):")
show_acc_matrix(am_test, inc_pts)
# 4. use model
print("\nPredicting income for M 35 Oklahoma moderate ")
x = np.array([[0, 35, 2, 1]], dtype=np.float64)
y_pred = model.predict(x)
print("\nPredicted income = %0.2f " % y_pred[0])
print("\nEnd demo ")
# -----------------------------------------------------------
if __name__ == "__main__":
main()
Training data:
# people_train.txt # sex (M = 0, F = 1) # age # State (Michigan = 0, Nebraska = 1, Oklahoma = 2) # income # politics (conservative = 0, moderate = 1, liberal = 2) # 1, 24, 0, 29500.00, 2 0, 39, 2, 51200.00, 1 1, 63, 1, 75800.00, 0 0, 36, 0, 44500.00, 1 1, 27, 1, 28600.00, 2 1, 50, 1, 56500.00, 1 1, 50, 2, 55000.00, 1 0, 19, 2, 32700.00, 0 1, 22, 1, 27700.00, 1 0, 39, 2, 47100.00, 2 1, 34, 0, 39400.00, 1 0, 22, 0, 33500.00, 0 1, 35, 2, 35200.00, 2 0, 33, 1, 46400.00, 1 1, 45, 1, 54100.00, 1 1, 42, 1, 50700.00, 1 0, 33, 1, 46800.00, 1 1, 25, 2, 30000.00, 1 0, 31, 1, 46400.00, 0 1, 27, 0, 32500.00, 2 1, 48, 0, 54000.00, 1 0, 64, 1, 71300.00, 2 1, 61, 1, 72400.00, 0 1, 54, 2, 61000.00, 0 1, 29, 0, 36300.00, 0 1, 50, 2, 55000.00, 1 1, 55, 2, 62500.00, 0 1, 40, 0, 52400.00, 0 1, 22, 0, 23600.00, 2 1, 68, 1, 78400.00, 0 0, 60, 0, 71700.00, 2 0, 34, 2, 46500.00, 1 0, 25, 2, 37100.00, 0 0, 31, 1, 48900.00, 1 1, 43, 2, 48000.00, 1 1, 58, 1, 65400.00, 2 0, 55, 1, 60700.00, 2 0, 43, 1, 51100.00, 1 0, 43, 2, 53200.00, 1 0, 21, 0, 37200.00, 0 1, 55, 2, 64600.00, 0 1, 64, 1, 74800.00, 0 0, 41, 0, 58800.00, 1 1, 64, 2, 72700.00, 0 0, 56, 2, 66600.00, 2 1, 31, 2, 36000.00, 1 0, 65, 2, 70100.00, 2 1, 55, 2, 64300.00, 0 0, 25, 0, 40300.00, 0 1, 46, 2, 51000.00, 1 0, 36, 0, 53500.00, 0 1, 52, 1, 58100.00, 1 1, 61, 2, 67900.00, 0 1, 57, 2, 65700.00, 0 0, 46, 1, 52600.00, 1 0, 62, 0, 66800.00, 2 1, 55, 2, 62700.00, 0 0, 22, 2, 27700.00, 1 0, 50, 0, 62900.00, 0 0, 32, 1, 41800.00, 1 0, 21, 2, 35600.00, 0 1, 44, 1, 52000.00, 1 1, 46, 1, 51700.00, 1 1, 62, 1, 69700.00, 0 1, 57, 1, 66400.00, 0 0, 67, 2, 75800.00, 2 1, 29, 0, 34300.00, 2 1, 53, 0, 60100.00, 0 0, 44, 0, 54800.00, 1 1, 46, 1, 52300.00, 1 0, 20, 1, 30100.00, 1 0, 38, 0, 53500.00, 1 1, 50, 1, 58600.00, 1 1, 33, 1, 42500.00, 1 0, 33, 1, 39300.00, 1 1, 26, 1, 40400.00, 0 1, 58, 0, 70700.00, 0 1, 43, 2, 48000.00, 1 0, 46, 0, 64400.00, 0 1, 60, 0, 71700.00, 0 0, 42, 0, 48900.00, 1 0, 56, 2, 56400.00, 2 0, 62, 1, 66300.00, 2 0, 50, 0, 64800.00, 1 1, 47, 2, 52000.00, 1 0, 67, 1, 80400.00, 2 0, 40, 2, 50400.00, 1 1, 42, 1, 48400.00, 1 1, 64, 0, 72000.00, 0 0, 47, 0, 58700.00, 2 1, 45, 1, 52800.00, 1 0, 25, 2, 40900.00, 0 1, 38, 0, 48400.00, 0 1, 55, 2, 60000.00, 1 0, 44, 0, 60600.00, 1 1, 33, 0, 41000.00, 1 1, 34, 2, 39000.00, 1 1, 27, 1, 33700.00, 2 1, 32, 1, 40700.00, 1 1, 42, 2, 47000.00, 1 0, 24, 2, 40300.00, 0 1, 42, 1, 50300.00, 1 1, 25, 2, 28000.00, 2 1, 51, 1, 58000.00, 1 0, 55, 1, 63500.00, 2 1, 44, 0, 47800.00, 2 0, 18, 0, 39800.00, 0 0, 67, 1, 71600.00, 2 1, 45, 2, 50000.00, 1 1, 48, 0, 55800.00, 1 0, 25, 1, 39000.00, 1 0, 67, 0, 78300.00, 1 1, 37, 2, 42000.00, 1 0, 32, 0, 42700.00, 1 1, 48, 0, 57000.00, 1 0, 66, 2, 75000.00, 2 1, 61, 0, 70000.00, 0 0, 58, 2, 68900.00, 1 1, 19, 0, 24000.00, 2 1, 38, 2, 43000.00, 1 0, 27, 0, 36400.00, 1 1, 42, 0, 48000.00, 1 1, 60, 0, 71300.00, 0 0, 27, 2, 34800.00, 0 1, 29, 1, 37100.00, 0 0, 43, 0, 56700.00, 1 1, 48, 0, 56700.00, 1 1, 27, 2, 29400.00, 2 0, 44, 0, 55200.00, 0 1, 23, 1, 26300.00, 2 0, 36, 1, 53000.00, 2 1, 64, 2, 72500.00, 0 1, 29, 2, 30000.00, 2 0, 33, 0, 49300.00, 1 0, 66, 1, 75000.00, 2 0, 21, 2, 34300.00, 0 1, 27, 0, 32700.00, 2 1, 29, 0, 31800.00, 2 0, 31, 0, 48600.00, 1 1, 36, 2, 41000.00, 1 1, 49, 1, 55700.00, 1 0, 28, 0, 38400.00, 0 0, 43, 2, 56600.00, 1 0, 46, 1, 58800.00, 1 1, 57, 0, 69800.00, 0 0, 52, 2, 59400.00, 1 0, 31, 2, 43500.00, 1 0, 55, 0, 62000.00, 2 1, 50, 0, 56400.00, 1 1, 48, 1, 55900.00, 1 0, 22, 2, 34500.00, 0 1, 59, 2, 66700.00, 0 1, 34, 0, 42800.00, 2 0, 64, 0, 77200.00, 2 1, 29, 2, 33500.00, 2 0, 34, 1, 43200.00, 1 0, 61, 0, 75000.00, 2 1, 64, 2, 71100.00, 0 0, 29, 0, 41300.00, 0 1, 63, 1, 70600.00, 0 0, 29, 1, 40000.00, 0 0, 51, 0, 62700.00, 1 0, 24, 2, 37700.00, 0 1, 48, 1, 57500.00, 1 1, 18, 0, 27400.00, 0 1, 18, 0, 20300.00, 2 1, 33, 1, 38200.00, 2 0, 20, 2, 34800.00, 0 1, 29, 2, 33000.00, 2 0, 44, 2, 63000.00, 0 0, 65, 2, 81800.00, 0 0, 56, 0, 63700.00, 2 0, 52, 2, 58400.00, 1 0, 29, 1, 48600.00, 0 0, 47, 1, 58900.00, 1 1, 68, 0, 72600.00, 2 1, 31, 2, 36000.00, 1 1, 61, 1, 62500.00, 2 1, 19, 1, 21500.00, 2 1, 38, 2, 43000.00, 1 0, 26, 0, 42300.00, 0 1, 61, 1, 67400.00, 0 1, 40, 0, 46500.00, 1 0, 49, 0, 65200.00, 1 1, 56, 0, 67500.00, 0 0, 48, 1, 66000.00, 1 1, 52, 0, 56300.00, 2 0, 18, 0, 29800.00, 0 0, 56, 2, 59300.00, 2 0, 52, 1, 64400.00, 1 0, 18, 1, 28600.00, 1 0, 58, 0, 66200.00, 2 0, 39, 1, 55100.00, 1 0, 46, 0, 62900.00, 1 0, 40, 1, 46200.00, 1 0, 60, 0, 72700.00, 2 1, 36, 1, 40700.00, 2 1, 44, 0, 52300.00, 1 1, 28, 0, 31300.00, 2 1, 54, 2, 62600.00, 0
Test data:
# people_test.txt # 0, 51, 0, 61200.00, 1 0, 32, 1, 46100.00, 1 1, 55, 0, 62700.00, 0 1, 25, 2, 26200.00, 2 1, 33, 2, 37300.00, 2 0, 29, 1, 46200.00, 0 1, 65, 0, 72700.00, 0 0, 43, 1, 51400.00, 1 0, 54, 1, 64800.00, 2 1, 61, 1, 72700.00, 0 1, 52, 1, 63600.00, 0 1, 30, 1, 33500.00, 2 1, 29, 0, 31400.00, 2 0, 47, 2, 59400.00, 1 1, 39, 1, 47800.00, 1 1, 47, 2, 52000.00, 1 0, 49, 0, 58600.00, 1 0, 63, 2, 67400.00, 2 0, 30, 0, 39200.00, 0 0, 61, 2, 69600.00, 2 0, 47, 2, 58700.00, 1 1, 30, 2, 34500.00, 2 0, 51, 2, 58000.00, 1 0, 24, 0, 38800.00, 1 0, 49, 0, 64500.00, 1 1, 66, 2, 74500.00, 0 0, 65, 0, 76900.00, 0 0, 46, 1, 58000.00, 0 0, 45, 2, 51800.00, 1 0, 47, 0, 63600.00, 0 0, 29, 0, 44800.00, 0 0, 57, 2, 69300.00, 2 0, 20, 0, 28700.00, 2 0, 35, 0, 43400.00, 1 0, 61, 2, 67000.00, 2 0, 31, 2, 37300.00, 1 1, 18, 0, 20800.00, 2 1, 26, 2, 29200.00, 2 0, 28, 0, 36400.00, 2 0, 59, 2, 69400.00, 2

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2025 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2025 G2E Conference
2025 iSC West Conference
You must be logged in to post a comment.