An Example When a Machine Learning Regression Model Has a Negative R2 Score

If you have a machine learning regression model that predicts a single numeric value, the three most common ways to evaluate the model are mean squared error (MSE), accuracy, and coefficient of determination (R2).

Note that root mean squared error (RMSE) is just the square root of MSE. This is useful when the target variable to predict has units, such as dollars. MSE has units “dollars-squared” but RMSE has units “dollars”.

MSE is reasonably interpretable, for example, if MSE = 0, the model predicts perfectly. But MSE has no upper limit, and MSE depends on how the data items are scaled.

Accuracy is very interpretable, for example, if accuracy = 75% and there are 200 data items, the model predicts 150 out of 200 correctly. But accuracy requires an arbitrary percentage closeness (perhaps 10%) that determines if a prediction is correct or not.

R2 is sort of a cross between MSE and accuracy. It’s often stated that “R2 is a value between 0 and 1 where higher values indicate a more accurate model.” So, an R2 score of 1 means a model predicts perfectly (which is never possible in practice). R2 is calculated as 1 – (SSres / SStot). The SSres is the sum of the squared differences between target y and predicted y values. The SStot is the sum of squared differences between average of target y and predicted y values. (Note R2 is not the same as classical statistics r2 for correlation).

The R2 score doesn’t isn’t affected by data scaling (good) but R2 can be negative (not good), and there’s no theoretical limit to how negative R2 can be.

As far as I know, all scikit-learn library regression modules (LinearRegression, Ridge, KernelRidge, GradientBoostingRegressor, etc.) have a built in score() function that gives the R2 value for the trained model. Implementing R2 from scratch is easy.

Suppose you have a set of training data. The simplest possible regression model is to just predict the average of the target y values in the training data, for any input x. But if you have a really bad regression model that predicts even worse than just returning the average of the target y values, the R2 score will be negative.

Here’s an example:

x0    x1    y        pred y   ss res   ss tot
0.2   0.3   0.34      0.19    0.0225   0.0841
0.6   0.5   0.86      0.61    0.0625   0.0529
0.3   0.7   0.79      0.44    0.1225   0.0256
0.6   0.2   0.56      0.46    0.0100   0.0049
0.1   0.6   0.61      0.31    0.0900   0.0004
0.3   0.4   0.49      0.29    0.0400   0.0196
0.5   0.5   0.75      0.50    0.0625   0.0144
0.2   0.6   0.64      0.34    0.0900   0.0001
                  
            5.04              0.5000   0.2020
            0.63            
                              R2 = -1.4752

The target y values are calculated by an unseen y = (x0 * x0) + x1. The predicted y values are computed by the terrible model y’ = (x0 * x0) – x1. R2 = 1 – (0.5000 / 0.2020) = 1 – 2.4752 = -1.4752.

The moral of this blog post is that there’s no single best way to evaluate a regression model.

Evaluating a regression model is mostly objective. Evaluating the name of a hotel is subjective. But here are a couple of names of foreign hotels that don’t resonate well in English.

Demo program.

# ridge_scikit_scratch_r2.py
# scratch R2 with scikit Ridge
# just for fun

import numpy as np
from sklearn.linear_model import Ridge

# Ridge(alpha=1.0, *, fit_intercept=True, copy_X=True,
# max_iter=None, tol=0.0001, solver='auto', positive=False,
# random_state=None)

# -----------------------------------------------------------

def r2(model, data_X, data_y):
  n = len(data_X)
  ss_res = 0.0
  ss_tot = 0.0
  mean_y = np.mean(data_y)
  for i in range(n):
    x = data_X[i]
    y = data_y[i]
    pred_y = model.predict([x])[0]
    ss_res += (y - pred_y) * (y -pred_y) # inefficient . .
    ss_tot += (y - mean_y) * (y -mean_y)

  r2 = 1.0 - (ss_res / ss_tot) # assume non-zero . .
  return r2

# -----------------------------------------------------------

print("\nBegin from-scratch R2 demo ")

np.set_printoptions(precision=4, suppress=True,
    floatmode='fixed')

print("\nLoading synthetic train (20) data ")
train_Xy = np.loadtxt(".\\Data\\synthetic_train_20.txt",
  usecols=[0,1,2,3,4,5], delimiter=",")
train_X = train_Xy[:,[0,1,2,3,4]]
train_y = train_Xy[:,5]

print("\nFirst three train X: ")
for i in range(3):
  print(train_X[i])
print("\nFirst three train y: ")
for i in range(3):
  print("%0.4f " % train_y[i])

print("\nCreating scikit Ridge model ")
alpha = 1.0
print("Using default L2 alpha = %0.4f " % alpha)
model = Ridge(alpha=alpha, solver='sag',
  fit_intercept=True, random_state=0)
print("Done ")

print("\nTraining scikit Ridge model ")
print("Using SAG with default training params ")
model.fit(train_X, train_y)
print("Done. Used " + str(model.n_iter_) + " iterations" )

print("\nModel weights: ")
print(model.coef_)
print("Model bias = %0.4f " % model.intercept_)

r2_scikit = model.score(train_X, train_y)
r2_scratch = r2(model, train_X, train_y)

print("\nR2 using built-in score() = %0.4f " % r2_scikit)
print("R2 using from-scratch = %0.4f " % r2_scratch)

print("\nEnd demo ")

Demo data:

# synthetic_train_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996