"Quadratic Regression with SGD Training Using JavaScript" in Visual Studio Magazine

I wrote an article titled “Quadratic Regression with SGD Training Using JavaScript” in the March 2026 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2026/03/11/quadratic-regression-with-sgd-training-using-javascript.aspx.

The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict an employee’s salary based on age, IQ test score, high school grade point average, and so on. There are approximately a dozen common regression techniques. The most basic technique is called linear regression.

The form of a basic linear regression prediction model is y’ = (w0 * x0) + (w1 * x1) + . . + b, where y’ is the predicted value, the xi are predictor values, the wi are weights, and b is the bias. Quadratic regression extends linear regression. The form of a quadratic regression model is y’ = (w0 * x0) + . . + (wn * xn) + (wj * x0 * x0) + . . + (wk * x0 * x1) + . . . + b. There are derived predictors that are the square of each original predictor, and interaction terms that are the multiplication product of all possible pairs of original predictors.

Compared to basic linear regression, quadratic regression can handle more complex data. And compared to the most powerful regression techniques that are designed to handle complex data, quadratic regression often has slightly worse prediction accuracy, but is much easier to implement and train, and has much better model interpretability.

My article presents a demo of quadratic regression, implemented from scratch, trained with stochastic gradient descent (SGD), using the JavaScript language. The output of the demo program is:

Begin quadratic regression with SGD training
 using node.js JavaScript

Loading train (200) and test (40) from file

First three train X:
 -0.1660   0.4406  -0.9998  -0.3953  -0.7065
  0.0776  -0.1616   0.3704  -0.5911   0.7562
 -0.9452   0.3409  -0.1654   0.1174  -0.7192

First three train y:
   0.4840
   0.1568
   0.8054

Creating quadratic regression model

Setting lrnRate = 0.001
Setting maxEpochs = 1000

Starting SGD training
epoch =      0  MSE = 0.0925  acc = 0.0100
epoch =    200  MSE = 0.0003  acc = 0.9050
epoch =    400  MSE = 0.0003  acc = 0.8850
epoch =    600  MSE = 0.0003  acc = 0.8850
epoch =    800  MSE = 0.0003  acc = 0.8850
Done

Model base weights:
 -0.2630  0.0354 -0.0420  0.0341 -0.1124

Model quadratic weights:
  0.0655  0.0194  0.0051  0.0047  0.0243

Model interaction weights:
  0.0043  0.0249  0.0071  0.1081 -0.0012 -0.0093
  0.0362  0.0085 -0.0568  0.0016

Model bias: 0.3220

Computing model accuracy

Train acc (within 0.10) = 0.8850
Test acc (within 0.10) = 0.9250

Train MSE = 0.0003
Test MSE = 0.0005

Predicting for x =
  -0.1660    0.4406   -0.9998   -0.3953   -0.7065
Predicted y = 0.4843

End demo

The demo data is synthetic and was generated by a 5-10-1 neural network with random weight and bias values. The accuracy function scores a prediction as correct if it’s within 10% of the correct target value.

The squared (aka “quadratic”) xi^2 terms handle non-linear structure. If there are n predictors, there are also n squared terms. The xi * xj terms between all possible pairs of original predictors handle interactions between predictors. If there are n predictors, there are (n * (n-1)) / 2 interaction terms.

Therefore, in general, if there are n original predictor variables, there are a total of n + n + (n * (n-1))/2 model weights and one bias. Behind the scenes, the derived xi^2 squared terms and the derived xi*xj interaction terms are computed programmatically on-the-fly, as opposed to explicitly creating an augmented dataset.

Quadratic regression has a nice balance of prediction power and interpretability. The model weights/coefficients are easy to interpret. If the predictor values have been normalized to the same scale, larger magnitudes mean larger effect, and the sign of the weights indicate the direction of the effect. But you should be a bit careful when interpreting the interaction weights. If an interaction weight is positive, it can indicate an increase in y when the corresponding pair of predictor values are both positive or both negative.

I wrote this blog post while I was visiting Japan. I’m not a huge fan of the Japanese manga style of art, but I do like several animated movies from Studio Ghibli. Three of my favorites feature a young girl as the central character, reflecting Japan’s mildly creepy (to me anyway) cultural focus on young girls.

Left: In “Spirited Away” (2001), a 10-year-old girl named Chihiro must save her parents from a witch named Yubaba. Great art and a totally weird plot. My grade = A-.

Center: In “My Neighbor Totoro” (1988), university professor Tatsuo Kusakabe, his sick wife Yasuko, and his two young daughters Satsuki and Mei, move into an old house near a hospital. Totoro is one of several spirits who live nearby. My grade = A-.

Right: In “Kiki’s Delivery Service” (1989), Kiki is a young witch who runs a delivery service, aided by her cat Jiji. My grade = A-.