I wrote an article titled “Quadratic Regression with SGD Training Using C#” in the January 2026 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2026/01/21/quadratic-regression-with-sgd-training-using-csharp.aspx.
The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict an employee’s salary based on age, height, high school grade point average, and so on. There are approximately a dozen common regression techniques. The most basic technique is called linear regression, or sometimes multiple linear regression, where the “multiple” indicates two or more predictor variables.
The form of a basic linear regression prediction model is y’ = (w0 * x0) + (w1 * x1) + . . + (wn * xn) + b, where y’ is the predicted value, the xi are predictor values, the wi are weights, and b is the bias. Quadratic regression extends linear regression. The form of a quadratic regression model is y’ = (w0 * x0) + . . + (wn * xn) + (wj * x0 * x0) + . . + (wk * x0 * x1) + . . . + b. There are derived predictors that are the square of each original predictor, and interaction terms that are the multiplication product of all possible pairs of original predictors.
Compared to basic linear regression, quadratic regression can handle more complex data. Compared to the most powerful regression techniques such as neural network regression, quadratic regression often has slightly worse prediction accuracy, but has much better model interpretability.
There are several ways to train a quadratic regression model, including stochastic gradient descent (SGD), pseudo-inverse training, closed form inverse training, L-BFGS optimization training and so on. The demo program uses SGD training, which is iterative and requires a learning rate and a maximum number of epochs. These two parameter values must be determined by trial and error.
The output of the demo program is:
Begin C# quadratic regression with SGD training Loading synthetic train (200) and test (40) data Done First three train X: -0.1660 0.4406 -0.9998 -0.3953 -0.7065 0.0776 -0.1616 0.3704 -0.5911 0.7562 -0.9452 0.3409 -0.1654 0.1174 -0.7192 First three train y: 0.4840 0.1568 0.8054 Creating quadratic regression model Setting lrnRate = 0.001 Setting maxEpochs = 1000 Starting SGD training epoch = 0 MSE = 0.0957 epoch = 200 MSE = 0.0003 epoch = 400 MSE = 0.0003 epoch = 600 MSE = 0.0003 epoch = 800 MSE = 0.0003 Done Model base weights: -0.2630 0.0354 -0.0420 0.0341 -0.1124 Model quadratic weights: 0.0655 0.0194 0.0051 0.0047 0.0243 Model interaction weights: 0.0043 0.0249 0.0071 0.1081 -0.0012 -0.0093 0.0362 0.0085 -0.0568 0.0016 Model bias/intercept: 0.3220 Evaluating model Accuracy train (within 0.10) = 0.8850 Accuracy test (within 0.10) = 0.9250 MSE train = 0.0003 MSE test = 0.0005 Predicting for x = -0.1660 0.4406 -0.9998 -0.3953 -0.7065 Predicted y = 0.4843 End demo
Suppose, as in the demo data, there are five predictors, aka features, (x0, x1, x2, x3, x4). The prediction equation for basic linear regression is:
y' = (w0 * x0) + (w1 * x1) + (w2 * x2) + (w3 * x3) + (w4 * x4) + b
The wi are model weights (aka coefficients), and b is the model bias (aka intercept). The values of the weights and the bias must be determined by training, so that predicted y’ values are close to the known, correct y values in a set of training data.
Basic linear regression is simple but it can’t predict well for data that has an underlying non-linear structure, and basic linear regression can’t deal with data that has hidden interactions between the xi predictors.
The prediction equation for quadratic regression with five predictors is:
y' = (w0 * x0) + (w1 * x1) + (w2 * x2) + (w3 * x3) +
(w4 * x4) +
(w5 * x0*x0) + (w6 * x1*x1) + (w7 * x2*x2) +
(w8 * x3*x3) + (w9 * x4*x4) +
(w10 * x0*x1) + (w11 * x0*x2) + (w12 * x0*x3) +
(w13 * x0*x4) + (w14 * x1*x2) + (w15 * x1*x3) +
(w16 * x1*x4) + (w17 * x2*x3) + (w18 * x2*x4) +
(w19 * x3*x4)
+ b
The squared (aka “quadratic”) xi^2 terms handle non-linear structure. If there are n predictors, there are also n squared terms. The xi * xj terms between all possible pairs pf original predictors handle interactions between predictors. If there are n predictors, there (n * (n-1)) / 2 interaction terms.
Quadratic regression has a nice balance of prediction power and interpretability. The model weights/coefficients are easy to interpret. If the predictor values have been normalized to the same scale, larger magnitudes mean larger effect, and the sign of the weights indicate the direction of the effect.

Quadratic regression is a classical machine learning technique that still has a lot of appeal. Classical science fiction magazines often featured covers with giant insects. Here are three with giant ants. Left: “Amazing Stories”, Fall 1928. Center: “Thrilling Wonder Stories”, December 1938. Right: The German “Utopia” was a series of magazines / short novels, published every other week, from 1953 to 1968. This is #192 from September 15, 1959.


.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2025 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2025 G2E Conference
2025 iSC West Conference
You must be logged in to post a comment.