The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict the annual income of a person based on their age, bank account balance, years of work experience, and so on.
When evaluating a regression prediction model, the three main metrics are mean squared error (MSE), root mean squared error (RMSE), and R-squared, aka coefficient of determination, aka R2. In most scenarios, I use MSE.
MSE is computed as the average of the sum of the squared differences between predicted y values and actual y values. MSE = sum((y – y’)^2) / n where y is actual target y, y’ is predicted target y, and n is the number of data items.
RMSE is just the square root of MSE. If the target y values have units, such as dollars or inches, then MSE has squared units, such as dollars-squared or inches-squared. This isn’t a big deal but sometimes it’s nice to have the original units as the evaluation metric.
On the other hand, many regression techniques are designed to minimize MSE, because MSE is Calculus-differentiable (and those models can be trained using stochastic gradient descent) but RMSE is not differentiable. Therefore, it makes sense to evaluate a regression model using the same metric the model is trained on.
For MSE and RMSE, smaller values are better (meaning better model prediction).
Now, what about R2, the coefficient of determination? It is computed as R2 = 1.0 – (u / v) where u = sum(y – y’)^2 and v = sum(y – y”)^2 where y is actual target y, y’ is predicted target y, and y” is the average of the actual target y values.
For R2, larger values are better.
In the abstract, R2 measures how well the model predicts relative to guessing the average of the target y values. This is sometimes described as the proportion of the variance explained by the model.
The scikit-learn code library regression models, such as LinearRegression, KernelRidge, and many others, have a built-in score() method that give R2 rather than MSE or RMSE. I’ve never seen an explanation of why scikit uses R2 rather than the simpler MSE for the score() method. I suspect that R2 is used because it doesn’t vary for different normalizations of the training data, but MSE does vary (I think — I’m not 100% sure).
In the end, I don’t think it matters too much which metric — MSE, RMSE, R2 — is used to evaluate a regression model, as long as you’re consistent. If one of these three metrics was clearly superior to the others, then it would be the only one used.
I put together a quick demo, using a linear regression model:
Begin C# linear regression demo Loading synthetic train (200) and test (40) data Done First three train X: -0.1660 0.4406 -0.9998 -0.3953 -0.7065 0.0776 -0.1616 0.3704 -0.5911 0.7562 -0.9452 0.3409 -0.1654 0.1174 -0.7192 First three train y: 0.4840 0.1568 0.8054 Setting lrnRate = 0.0010 Setting maxEpohcs = 100 Creating and training Linear Regression model epoch = 0 MSE = 0.3364 epoch = 20 MSE = 0.0663 epoch = 40 MSE = 0.0520 epoch = 60 MSE = 0.0509 epoch = 80 MSE = 0.0508 Done Coefficients/weights: -0.2652 0.0333 -0.0457 0.0357 -0.1145 Bias/constant: 0.3620 Evaluating model Accuracy train (within 0.15) = 0.6500 Accuracy test (within 0.15) = 0.7750 MSE train = 0.0508 MSE test = 0.0443 RMSE train = 0.2255 RMSE test = 0.2106 R2 train = 0.9267 R2 test = 0.9302 Predicting for x = -0.1660 0.4406 -0.9998 -0.3953 -0.7065 Predicted y = 0.5332 End demo
Code for the R2 method, using C#:
public double RSquared(double[][] dataX, double[] dataY)
{
// coefficient of determination
// sum of (act - pred)^2 / sum of (act - act_mean)^2
// 1. compute mean actual y
int n = dataX.Length;
double sum = 0.0;
for (int i = 0; i "lt" n; ++i)
sum += dataY[i];
double meanActual = sum / n;
double sumTop = 0.0;
double sumBot = 0.0;
for (int i = 0; i "lt" n; ++i)
{
double predY = this.Predict(dataX[i]);
sumTop += (dataY[i] - predY) * (dataY[i] - predY);
sumBot += (dataY[i] - meanActual) * (dataY[i] - meanActual);
}
return 1.0 - (sumTop / sumBot);
}
The complete code and data can be found at: jamesmccaffrey.wordpress.com/2024/12/31/linear-regression-from-scratch-using-csharp/.

Measuring machine learning model error is relatively straightforward. It’s not so easy to measure animal photo error, but you know it when you see it. Duck, goat, stingray.

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
Hi James, great post, the R2 method looks cool and I learned a lot. But why not mention the MAE, would you always prefer squared error measuring?