Machine Learning Regression Model Evaluation: MSE vs. RMSE vs. R2

The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict the annual income of a person based on their age, bank account balance, years of work experience, and so on.

When evaluating a regression prediction model, the three main metrics are mean squared error (MSE), root mean squared error (RMSE), and R-squared, aka coefficient of determination, aka R2. In most scenarios, I use MSE.

MSE is computed as the average of the sum of the squared differences between predicted y values and actual y values. MSE = sum((y – y’)^2) / n where y is actual target y, y’ is predicted target y, and n is the number of data items.

RMSE is just the square root of MSE. If the target y values have units, such as dollars or inches, then MSE has squared units, such as dollars-squared or inches-squared. This isn’t a big deal but sometimes it’s nice to have the original units as the evaluation metric.

On the other hand, many regression techniques are designed to minimize MSE, because MSE is Calculus-differentiable (and those models can be trained using stochastic gradient descent) but RMSE is not differentiable. Therefore, it makes sense to evaluate a regression model using the same metric the model is trained on.

For MSE and RMSE, smaller values are better (meaning better model prediction).

Now, what about R2, the coefficient of determination? It is computed as R2 = 1.0 – (u / v) where u = sum(y – y’)^2 and v = sum(y – y”)^2 where y is actual target y, y’ is predicted target y, and y” is the average of the actual target y values.

For R2, larger values are better.

In the abstract, R2 measures how well the model predicts relative to guessing the average of the target y values. This is sometimes described as the proportion of the variance explained by the model.

The scikit-learn code library regression models, such as LinearRegression, KernelRidge, and many others, have a built-in score() method that give R2 rather than MSE or RMSE. I’ve never seen an explanation of why scikit uses R2 rather than the simpler MSE for the score() method. I suspect that R2 is used because it doesn’t vary for different normalizations of the training data, but MSE does vary (I think — I’m not 100% sure).

In the end, I don’t think it matters too much which metric — MSE, RMSE, R2 — is used to evaluate a regression model, as long as you’re consistent. If one of these three metrics was clearly superior to the others, then it would be the only one used.

I put together a quick demo, using a linear regression model:

Begin C# linear regression demo

Loading synthetic train (200) and test (40) data
Done

First three train X:
 -0.1660  0.4406 -0.9998 -0.3953 -0.7065
  0.0776 -0.1616  0.3704 -0.5911  0.7562
 -0.9452  0.3409 -0.1654  0.1174 -0.7192

First three train y:
  0.4840
  0.1568
  0.8054

Setting lrnRate = 0.0010
Setting maxEpohcs = 100

Creating and training Linear Regression model
epoch =     0  MSE =   0.3364
epoch =    20  MSE =   0.0663
epoch =    40  MSE =   0.0520
epoch =    60  MSE =   0.0509
epoch =    80  MSE =   0.0508
Done

Coefficients/weights:
-0.2652  0.0333  -0.0457  0.0357  -0.1145
Bias/constant: 0.3620

Evaluating model

Accuracy train (within 0.15) = 0.6500
Accuracy test (within 0.15) = 0.7750

MSE train = 0.0508
MSE test = 0.0443

RMSE train = 0.2255
RMSE test = 0.2106

R2 train = 0.9267
R2 test = 0.9302

Predicting for x =
  -0.1660   0.4406  -0.9998  -0.3953  -0.7065

Predicted y = 0.5332

End demo

Code for the R2 method, using C#:

public double RSquared(double[][] dataX, double[] dataY)
{
  // coefficient of determination
  // sum of (act - pred)^2 / sum of (act - act_mean)^2
  // 1. compute mean actual y
  int n = dataX.Length;
  double sum = 0.0;

  for (int i = 0; i "lt" n; ++i)
    sum += dataY[i];
  double meanActual = sum / n;

  double sumTop = 0.0;
  double sumBot = 0.0;
  for (int i = 0; i "lt" n; ++i)
  {
    double predY = this.Predict(dataX[i]);
    sumTop += (dataY[i] - predY) * (dataY[i] - predY);
    sumBot += (dataY[i] - meanActual) * (dataY[i] - meanActual);
  }
  return 1.0 - (sumTop / sumBot);
}

The complete code and data can be found at: jamesmccaffrey.wordpress.com/2024/12/31/linear-regression-from-scratch-using-csharp/.



Measuring machine learning model error is relatively straightforward. It’s not so easy to measure animal photo error, but you know it when you see it. Duck, goat, stingray.


This entry was posted in Machine Learning. Bookmark the permalink.

1 Response to Machine Learning Regression Model Evaluation: MSE vs. RMSE vs. R2

  1. Thorsten Kleppe's avatar Thorsten Kleppe says:

    Hi James, great post, the R2 method looks cool and I learned a lot. But why not mention the MAE, would you always prefer squared error measuring?

Leave a Reply