I wrote an article titled “Quadratic Regression with SGD Training Using JavaScript” in the March 2026 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2026/03/11/quadratic-regression-with-sgd-training-using-javascript.aspx.
The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict an employee’s salary based on age, IQ test score, high school grade point average, and so on. There are approximately a dozen common regression techniques. The most basic technique is called linear regression.
The form of a basic linear regression prediction model is y’ = (w0 * x0) + (w1 * x1) + . . + b, where y’ is the predicted value, the xi are predictor values, the wi are weights, and b is the bias. Quadratic regression extends linear regression. The form of a quadratic regression model is y’ = (w0 * x0) + . . + (wn * xn) + (wj * x0 * x0) + . . + (wk * x0 * x1) + . . . + b. There are derived predictors that are the square of each original predictor, and interaction terms that are the multiplication product of all possible pairs of original predictors.
Compared to basic linear regression, quadratic regression can handle more complex data. And compared to the most powerful regression techniques that are designed to handle complex data, quadratic regression often has slightly worse prediction accuracy, but is much easier to implement and train, and has much better model interpretability.
My article presents a demo of quadratic regression, implemented from scratch, trained with stochastic gradient descent (SGD), using the JavaScript language. The output of the demo program is:
Begin quadratic regression with SGD training using node.js JavaScript Loading train (200) and test (40) from file First three train X: -0.1660 0.4406 -0.9998 -0.3953 -0.7065 0.0776 -0.1616 0.3704 -0.5911 0.7562 -0.9452 0.3409 -0.1654 0.1174 -0.7192 First three train y: 0.4840 0.1568 0.8054 Creating quadratic regression model Setting lrnRate = 0.001 Setting maxEpochs = 1000 Starting SGD training epoch = 0 MSE = 0.0925 acc = 0.0100 epoch = 200 MSE = 0.0003 acc = 0.9050 epoch = 400 MSE = 0.0003 acc = 0.8850 epoch = 600 MSE = 0.0003 acc = 0.8850 epoch = 800 MSE = 0.0003 acc = 0.8850 Done Model base weights: -0.2630 0.0354 -0.0420 0.0341 -0.1124 Model quadratic weights: 0.0655 0.0194 0.0051 0.0047 0.0243 Model interaction weights: 0.0043 0.0249 0.0071 0.1081 -0.0012 -0.0093 0.0362 0.0085 -0.0568 0.0016 Model bias: 0.3220 Computing model accuracy Train acc (within 0.10) = 0.8850 Test acc (within 0.10) = 0.9250 Train MSE = 0.0003 Test MSE = 0.0005 Predicting for x = -0.1660 0.4406 -0.9998 -0.3953 -0.7065 Predicted y = 0.4843 End demo
The demo data is synthetic and was generated by a 5-10-1 neural network with random weight and bias values. The accuracy function scores a prediction as correct if it’s within 10% of the correct target value.
The squared (aka “quadratic”) xi^2 terms handle non-linear structure. If there are n predictors, there are also n squared terms. The xi * xj terms between all possible pairs of original predictors handle interactions between predictors. If there are n predictors, there are (n * (n-1)) / 2 interaction terms.
Therefore, in general, if there are n original predictor variables, there are a total of n + n + (n * (n-1))/2 model weights and one bias. Behind the scenes, the derived xi^2 squared terms and the derived xi*xj interaction terms are computed programmatically on-the-fly, as opposed to explicitly creating an augmented dataset.
Quadratic regression has a nice balance of prediction power and interpretability. The model weights/coefficients are easy to interpret. If the predictor values have been normalized to the same scale, larger magnitudes mean larger effect, and the sign of the weights indicate the direction of the effect. But you should be a bit careful when interpreting the interaction weights. If an interaction weight is positive, it can indicate an increase in y when the corresponding pair of predictor values are both positive or both negative.

I wrote this blog post while I was visiting Japan. I’m not a huge fan of the Japanese manga style of art, but I do like several animated movies from Studio Ghibli. Three of my favorites feature a young girl as the central character, reflecting Japan’s mildly creepy (to me anyway) cultural focus on young girls.
Left: In “Spirited Away” (2001), a 10-year-old girl named Chihiro must save her parents from a witch named Yubaba. Great art and a totally weird plot. My grade = A-.
Center: In “My Neighbor Totoro” (1988), university professor Tatsuo Kusakabe, his sick wife Yasuko, and his two young daughters Satsuki and Mei, move into an old house near a hospital. Totoro is one of several spirits who live nearby. My grade = A-.
Right: In “Kiki’s Delivery Service” (1989), Kiki is a young witch who runs a delivery service, aided by her cat Jiji. My grade = A-.


.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2025 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2025 G2E Conference
2025 iSC West Conference
You must be logged in to post a comment.