The Origin and History of scikit-learn Diabetes Dataset

Bottom line: The scikit Diabetes Dataset was compiled by Stanford professor Brad Efron while he was helping Dr. T. McLaughlin analyze data for a medical study that was later published in “Differentiation Between Obesity and Insulin Resistance in the Association with C-Reactive Protein” (2002).

The scikit-learn library has a nice collection of datasets for experiments. One of the most popular datasets for regression is the Diabetes Dataset. See scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html.

Note: The scikit Diabetes Dataset is different from the Pima Indians Diabetes Dataset.

The Diabetes Dataset has 442 items. Each item represents a patient and has 10 predictor values followed by a target value to predict. The data looks like:

59, 2, 32.1, 101.00, 157,  93.2, 38, 4.00, 4.8598, 87, 151
48, 1, 21.6,  87.00, 183, 103.2, 70, 3.00, 3.8918, 69,  75
72, 2, 30.5,  93.00, 156,  93.6, 41, 4.00, 4.6728, 85, 141
. . . 

The 10 predictor variables are age, sex, body mass index, blood pressure, serum cholesterol, low-density lipoproteins, high-density lipoproteins, total cholesterol, triglycerides, blood sugar. The target value in the last column is a measure of diabetes.

Note: The sex encoding isn’t explained but I suspect male = 1, female = 2 because there are 235 1 values and 206 2 values).

The scikit page points to a Web page by Dennis Boos (NCSU) that has a link to the raw data. Boos’ page states that the data came from a Web page owned by Trevor Hastie (Stanford), but the link is dead. Boos’ Web page states that the Diabetes Data was first referenced in a research paper “Least Angle Regression” (2004), by B. Efron, T. Hastie, I. Johnstone, R. Tibshirani.

I enjoy tracking down original sources. So I wrote an email message to Efron, Hastie, Johnstone and Tibshirani (all are at Stanford) asking for information. Brad Efron graciously replied very quickly. He explained that the Diabetes Dataset came from data analysis that he was helping Dr. T. McLaughlin (Stanford) with. The results appeared in research paper “Differentiation Between Obesity and Insulin Resistance in the Association with C-Reactive Protein” (2002), by T. McLaughlin, F. Abbasi, C. Lamendola, L. Liang, G. Reaven, P. Schaaf, P. Reaven. That paper has the comment, “The authors express their appreciation to Bradley Efron, PhD, for statistical assistance with the manuscript.”

To summarize, the scikit Diabetes Dataset was originally generated by Dr. T. McLaughlin and colleagues. Then B. Efron, who was helping with the analysis, compiled 442 data items (not clear if this was a subset or the entire research dataset). Then the raw data was posted on a Web site by Efron’s colleague T. Hastie, and shortly later the data was also posted on a Web site by D. Boos. The Hastie page vanished at some point, leaving the Boos page as the primary source of the data. Later, the scikit library fetched the diabetes data from Boos’ page (most likely in 2010), where it is now widely available.

An interesting investigation!

Note: I discovered that the default target to predict, the diabetes score in the last column of the dataset, cannot be predicted with meaningful accuracy. But the variables in columns [4], [5], [6], [7], and [8] can be predicted nicely.



I enjoy history of all kinds, but especially the history of computer science and the history of early science fiction movies. Here are three of my favorite science fiction movies from the 1950s that feature alien flying saucers. All three movies have fascinating histories.

Left: “The Atomic Submarine” (1959) – In the near future, an alien flying saucer is under the sea in the Artic, destroying cargo submarines. The USS Tigerfish submarine manages to destroy the evil alien saucer. Very scary scenes inside the saucer. Innovative electronic sound effects.

Center: “Earth vs. the Flying Saucers” (1956) – The title pretty much says it all. Impressive stop-motion animation of the flying saucers by the famous Ray Harryhausen. I could never quite figure out if the aliens only retaliated after being attacked by the Earth military, or if they were evil from the beginning.

Right: “Invaders from Mars” (1953) – A young boy thinks he sees a flying saucer land in the sand pit in a field behind his house. This movie scared the heck out of me and I had nightmares about the path up the hill for years. Brilliantly directed by William Cameron Menzies.


Posted in Machine Learning | Leave a comment

Testing From-Scratch Pseudo-Inverse via SVD-Jacobi versus via QR-Householder Using Python

Computing the pseudo-inverse of a matrix is difficult. The pseudo-inverse is used in several machine learning algorithms, such as closed form training for linear models.

There are several types of pseudo-inverse. The most common is Moore-Penrose. There are several algorithms for Moore-Penrose. Two common techniques are via SVD and via QR. There are several algorithms for SVD (Jacobi, Golub-Reinsch, others), and there are several algorithms for QR (Householder, Gram-Schmidt, others). In short, there are dozens of algorithms and variations that can be used to compute a pseudo-inverse.

I implemented pseudo-inverse from scratch with Python using SVD-Jacobi and QR-Householder. Let me note that both algorithms are extremely complex, especially SVD-Jacobi. I ran some tests with the two from-scratch implementations, and compared them with the built-in numpy.linalg.pinv() function. The results:

Begin pseudo-inverse SVD vs. QR experiment

Start testing my_pinv_svd()
trial 40 FAIL (93, 88)
trial 84 FAIL (98, 94)
trial 104 FAIL (94, 82)
trial 145 FAIL (91, 90)
trial 154 FAIL (92, 92)
trial 178 FAIL (99, 93)
trial 180 FAIL (96, 87)
trial 185 FAIL (99, 96)
trial 191 FAIL (86, 85)
Num pass = 191
Num fail = 9

Start testing my_pinv_qr()
Num pass = 200
Num fail = 0

Start testing np.linalg.pinv()
Num pass = 200
Num fail = 0

End experiment

C:\Data\Junk\CSharp\PseudoInvSVDvsPseudoInvQR>

For each technique, I ran 200 tests. Each test was a matrix with a random size between 2×2 and 100×100, and with random values between -10.0 and +10.0. The from-scratch QR-Householder and built-in linalg.pinv() versions passed all 200 tests, but the SVD-Jacobi version failed for 9 large (about 90×90) matrices.

Bottom line: Use the built-in linalg.pinv() function if possible. If you must implement from scratch, the QR-Householder algorithm is preferable to the SVD-Jacobi algorithm. If you must implement from scratch using SVD-Jacobi, apply it only to small matrices (say 50×50 or smaller).



I’m not sure if I gained a love of matrix algebra from my love of chess, or vice versa. One of the ways I gauge AI image generation progress is by asking for images of people playing chess. The most recent images are much better than they were a few months ago, but AI still has a way to go.

Left: This was an interesting result. The chess board only has seven ranks but otherwise the set is pretty good. I’m puzzled by the meaning of the chess player’s welcoming posture.

Center: Not very good. All the pieces are black — no white pieces, and the pieces don’t make sense — two black queens, etc.) And I don’t like the fuzziness of the background.

Right: Apparently some sort of fusion of 1980s hairstyles, handcuff chess, and 1980s neon color schemes.


Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols. (My blog editor chokes on symbols).

# pinv_svd_vs_pinv_qr.py

import numpy as np

np.set_printoptions(precision=6, suppress=True, floatmode='fixed')

rnd = np.random.RandomState(1)

# -----------------------------------------------------------

def my_qr(A, mode):
  # mode = 'complete' or 'reduced'
  # this version defined only for m gte n
  m, n = A.shape
  k = np.minimum(m,n)
  R = A.copy()
  Q = np.eye(m)

  if m == n: end = n-1
  else: end = n
  for i in range(end):
    H = np.eye(m)

    # Householder routine follows
    a = R[i:, i]
    norm_a = np.copysign(np.linalg.norm(a), a[0])
    v = a / (a[0] + norm_a)
    v[0] = 1.0
    h = np.eye(len(a))
    V = np.matmul(v[:, None], v[None, :])
    h -= (2.0 / np.dot(v, v)) * V

    H[i:, i:] = h  # copy h into lower right
    Q = np.matmul(Q, H)
    R = np.matmul(H, R)

  # 'reduced' : returns Q, R with dims (m, k), (k, n)
  if mode == 'complete':
    return Q, R
  elif mode == 'reduced':
    return Q[0:m,0:k], R[0:k,0:n]

  return Q, R

# -----------------------------------------------------------

def mat_inv_upper_tri(U):
  n = len(U)
  result = np.eye(n)
  for k in range(n):
    for j in range(n):
      for i in range(k):
        result[j][k] -= result[j][i] * U[i][k]
      result[j][k] /= U[k][k]
  return result

# -----------------------------------------------------------

def my_pinv_qr(A):
  # A = Q*R, pinv(A) = inv(R) * trans(Q)
  Q, R = my_qr(A, 'reduced') 
  result = np.matmul(mat_inv_upper_tri(R), Q.T)
  return result

# -----------------------------------------------------------
# -----------------------------------------------------------

def my_svd(M):
  # Jacobi algorithm
  DBL_EPSILON = 1.0e-15  # approximately
  A = np.copy(M)  # working copy U
  m = len(A)
  n = len(A[0])

  Q = np.eye(n)  # working copy V
  t = np.zeros(n)  # working copy s

  # init counters
  count = 1
  sweep = 0
  sweep_max = max(5 * n, 12)  # heuristic

  tolerance = 10 * m * DBL_EPSILON  # heuristic
  # store the column error estimates in t
  for j in range(n):
    cj = A[:, j]  # get col j
    sj = np.linalg.norm(cj)
    t[j] = DBL_EPSILON * sj

  # orthogonalize A by plane rotations
  while (count "gt" 0 and sweep "lte" sweep_max):
    # initialize rotation counter
    count = n * (n - 1) / 2;
    for j in range(n-1):
      for k in range(j+1, n):
        cj = A[:, j]
        ck = A[:, k]
        p = 2 * np.dot(cj, ck)
        a = np.linalg.norm(cj)
        b = np.linalg.norm(ck)

        # test for columns j,k orthogonal,
        # or dominant errors 
        abserr_a = t[j]
        abserr_b = t[k]

        q = (a * a) - (b * b)
        v = np.sqrt(p**2 + q**2)  # hypot()
 
        sorted = (a "gte" b)
        orthog = (abs(p) "lte" tolerance * (a*b))
        noisya = (a "lt" abserr_a)
        noisyb = (b "lt" abserr_b)

        if sorted and (orthog or \
          noisya or noisyb):
          count -= 1
          continue

        # calculate rotation angles
        if v == 0 or sorted == False:
          cosine = 0.0
          sine = 1.0
        else:
          cosine = np.sqrt((v + q) / (2.0 * v))
          sine = p / (2.0 * v * cosine)

        # apply rotation to A (U)
        for i in range(m):
          Aik = A[i][k]
          Aij = A[i][j]
          A[i][j] = Aij * cosine + Aik * sine
          A[i][k] = -Aij * sine + Aik * cosine

        # update singular values
        t[j] = abs(cosine) * abserr_a + \
          abs(sine) * abserr_b
        t[k] = abs(sine) * abserr_a + \
          abs(cosine) * abserr_b

        # apply rotation to Q (V)
        for i in range(n):
          Qij = Q[i][j]
          Qik = Q[i][k]
          Q[i][j] = Qij * cosine + Qik * sine
          Q[i][k] = -Qij * sine + Qik * cosine

    sweep += 1
  # while

  # compute singular values
  prev_norm = -1.0
  for j in range(n):
    column = A[:, j]  # by ref
    norm = np.linalg.norm(column)
    # determine if singular value is zero
    if norm == 0.0 or prev_norm == 0.0 or \
      (j "gt" 0 and norm "lte" tolerance * prev_norm):
      t[j] = 0.0
      for i in range(len(column)):
        column[i] = 0.0  # updates A indirectly
      prev_norm = 0.0
    else:
      t[j] = norm
      for i in range(len(column)):
        column[i] = column[i] * (1.0 / norm)
      prev_norm = norm

  if count "gt" 0:
    print("Jacobi iterations no converge")

  U = A  # mxn
  s = t
  Vh = np.transpose(Q)

  if m "lt" n:
    U = U[:, 0:m]
    s = t[0:m]
    Vh = Vh[0:m, :]

  return U, s, Vh

# -----------------------------------------------------------

def my_pinv_svd(A):
  # U,s,Vh = svd(); pinv = V * Sinv * Uh
  U, s, Vh = my_svd(A)
  Sinv = np.diag(s)
  for i in range(len(s)):
    Sinv[i][i] = 1.0 / Sinv[i][i]
  V = Vh.T
  Uh = U.T
  return np.matmul(V, np.matmul(Sinv, Uh))

# -----------------------------------------------------------
# -----------------------------------------------------------

def make_matrix(max_rows, max_cols, min_x, max_x, rnd):
  nr = rnd.randint(2, max_rows)
  nc = rnd.randint(2, max_cols)
  if nr "lt" nc:
    (nr, nc) = (nc, nr)
  A = (max_x - min_x) * rnd.rand(nr,nc) + min_x
  return A

# -----------------------------------------------------------

print("\nBegin pseudo-inverse SVD vs. QR experiment ")

print("\nStart testing my_pinv_svd() ")
n_pass = 0; n_fail = 0
rnd.seed(0)
for i in range(200):
  A = make_matrix(100, 100, -10.0, 10.0, rnd)
  Apinv = my_pinv_svd(A)
  C = np.matmul(A, np.matmul(Apinv, A))
  if np.allclose(A, C) == True:
    n_pass += 1
  else:
    print("trial " + str(i), end="")
    print(" FAIL ", end="")
    print(A.shape)
    n_fail += 1
print("Num pass = " + str(n_pass))
print("Num fail = " + str(n_fail))

print("\nStart testing my_pinv_qr() ")
n_pass = 0; n_fail = 0
rnd.seed(0)
for i in range(200):
  A = make_matrix(100, 100, -10.0, 10.0, rnd)
  Apinv = my_pinv_qr(A)
  C = np.matmul(A, np.matmul(Apinv, A))
  if np.allclose(A, C) == True:
    n_pass += 1
  else:
    print("trial " + str(i), end="")
    print(" FAIL ", end="")
    print(A.shape)
    n_fail += 1
print("Num pass = " + str(n_pass))
print("Num fail = " + str(n_fail))

print("\nStart testing np.linalg.pinv() ")
n_pass = 0; n_fail = 0
rnd.seed(0)
for i in range(200):
  A = make_matrix(100, 100, -10.0, 10.0, rnd)
  Apinv = np.linalg.pinv(A)
  C = np.matmul(A, np.matmul(Apinv, A))
  if np.allclose(A, C) == True:
    n_pass += 1
  else:
    print("trial " + str(i), end="")
    print(" FAIL ", end="")
    print(A.shape)
    n_fail += 1
print("Num pass = " + str(n_pass))
print("Num fail = " + str(n_fail))

print("\nEnd experiment ")
Posted in Machine Learning | Leave a comment

Linear Regression with Pseudo-Inverse (SVD-Jacobi) Training Using JavaScript

The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict the bank account balance of a person based on his annual income, age, years of education, and so on.

Linear regression is the simplest form of machine learning regression. The prediction equation has the form y’ = (w0 * x0) + (w1 * x1) + . . + b, where the wi are weight values, the xi are predictor values, and b is the bias. For example, y’ = (0.56 * income) + (-0.37 * age) + (0.04 * education) + 1.66.

Training is the process of finding the values of the weights and the bias so that predicted y values closely match the known correct y values in a set of training data.

There are four main linear regression training techniques (and dozens of less-common techniques): stochastic gradient descent, relaxed Moore-Penrose pseudo-inverse, closed-form pseudo-inverse via normal equations, L-BFGS. Each of these four techniques has dozens of significant variations. For example, MP pseudo-inverse training can use singular value decomposition (SVD — many algorithms such as Jacobi, GKR, etc.) or QR decomposition (many algorithms such as Householder, Gram-Schmidt, Givens, etc.) In short, there are a bewildering number of different ways to train a linear regression prediction model.

Each training technique has pros and cons related to ease/difficulty of implementation and customization, ability to deal with large or small datasets, accuracy, performance, sensitivity to data normalization, ability to handle categorical data, and more. If there was one best training technique, it would be the only technique used.

One evening, I was sitting in my living room, waiting for the rain to stop (it never did) so I could walk my two dogs. While I was waiting, I figured I’d implement linear regression, using relaxed MP pseudo-inverse (via SVD-Jacobi) training, using JavaScript.

For my demo, I used one of my standard synthetic datasets. The data looks like:

-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
. . . 

The first five values on each line are the predictors (“features”) and the last value is the target y to predict. The data was generated by a 5-10-1 neural network with random weights and biases. There are 200 training items and 40 test items.

The equation for training is:

w = pinv(DX) * y

The w is a vector that holds the weights and the bias you want, pinv() is the relaxed Moore-Penrose pseudo-inverse function, DX is the design matrix derived from the X training data, and y is a vector of the known target values to predict.

The design matrix is the X predictor values with a leading column of 1.0s added. The leading column of 1.0s deals with the bias term.

The output of my demo:

Linear regression with pseudo-inverse (SVD-Jacobi)
 training using JavaScript

Loading synthetic train (200) and 
 test (40) data from file

First three train X:
 -0.1660   0.4406  -0.9998  -0.3953  -0.7065
  0.0776  -0.1616   0.3704  -0.5911   0.7562
 -0.9452   0.3409  -0.1654   0.1174  -0.7192

First three train y:
   0.4840
   0.1568
   0.8054

Creating and training model
Done

Model weights/coefficients:
 -0.2656   0.0333  -0.0454   0.0358  -0.1146
Model bias/intercept: 0.3619

Evaluating model

Train acc (within 0.10) = 0.4600
Test acc (within 0.10) = 0.6500

Train MSE = 0.0026
Test MSE = 0.0020

Predicting for x =
  -0.1660   0.4406  -0.9998 -0.3953 -0.7065
Predicted y = 0.5329

End demo

The output matches the output from several other training techniques, validating the results are correct. Good fun on a rainy evening in the Pacific Northwest.



I’ve always been fascinated by pinball machines. The “Saratoga” (1948) by Williams featured the first electrically-powered “pop” bumpers. Before this innovation, bumpers were spring-powered and barely moved the ball when struck. Notice that the two flippers are reversed from the modern geometry (which appeared two years later in a pair of machines by Gottlieb). Also, and no reel scoring, and no in-lanes, and no drop targets — features to appear in years to come.


Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols (my blog editor chokes on symbols).

// linear_regression_pinverse_svd.js
// linear regression, pseudo-inverse SVD-Jacobi training
// node.js

let FS = require("fs")  // for loadTxt()

// ----------------------------------------------------------

class LinearRegressor
{
  constructor(seed)
  {
    this.seed = seed + 0.5;  // virtual RNG, not used
    this.weights;            // allocated in train()
    this.bias = 0.0;         // supplied in train()
  }

  // --------------------------------------------------------
  // methods: predict(), train(), accuracy(), mse()
  // --------------------------------------------------------

  predict(x)
  {
    let sum = 0.0;
    for (let i = 0; i "lt" x.length; ++i) {
      sum += x[i] * this.weights[i];
    }
    sum += this.bias;
    return sum;
  }

  // --------------------------------------------------------

  train(trainX, trainY)
  {
    // w = pinv(DX) * y
    let dim = trainX[0].length;         // number predictors
    this.weights = this.vecMake(dim, 0.0);  // allocate wts

    let DX = this.matToDesign(trainX);  // design matrix
    let Xpinv = this.matPinv(DX);       // pinv of design X
    let biasAndWts = this.matVecProd(Xpinv, trainY);

    this.bias = biasAndWts[0];
    for (let i = 1; i "lt" biasAndWts.length; ++i)
      this.weights[i-1] = biasAndWts[i];
    return;
  }

  // --------------------------------------------------------

  accuracy(dataX, dataY, pctClose)
  {
    let nCorrect = 0; let nWrong = 0;
    let N = dataX.length;
    
    for (let i = 0; i "lt" N; ++i) {
      let x = dataX[i];
      let actualY = dataY[i];
      let predY = this.predict(x);
      if (Math.abs(predY - actualY) "lt" 
        Math.abs(pctClose * actualY)) {
        ++nCorrect;
      }
      else {
        ++nWrong;
      }
    }
    return (nCorrect * 1.0) / (nCorrect + nWrong);
  }

  // --------------------------------------------------------

  mse(dataX, dataY)
  {
    let N = dataX.length;
    let sum = 0.0;
    for (let i = 0; i "lt" N; ++i) {
      let x = dataX[i];
      let actualY = dataY[i];
      let predY = this.predict(x);
      sum += (actualY - predY) * (actualY - predY);
    }
    return sum / N;
  }

  // --------------------------------------------------------
  // primary helpers for train()
  // matVecProd, matPinv, matToDesign, svdJacobi
  // --------------------------------------------------------

  matVecProd(M, v)
  {
    // return a regular 1D vector
    let nRows = M.length;
    let nCols = M[0].length;
    let n = v.length;  // assume nCols = n

    let result = this.vecMake(nRows, 0.0);
    for (let i = 0; i "lt" nRows; ++i)
      for (let k = 0; k "lt" nCols; ++k)
        result[i] += M[i][k] * v[k];

    return result;
  }

  // --------------------------------------------------------

  matToDesign(M)
  {
    // add a leading column of 1.0s to M
    let nRows = M.length;
    let nCols = M[0].length;
    let result = this.matMake(nRows, nCols+1, 1.0);

    for (let i = 0; i "lt" nRows; ++i) {
      for (let j = 1; j "lt" nCols+1; ++j) {  // note 1s
        result[i][j] = M[i][j-1];
      }
    }
    return result;
  }

  // --------------------------------------------------------

  matPinv(M)  // pseudo-inverse of matrix M, using SVD-Jacobi
  {
    let decomp = this.svdJacobi(M); // SVD, Jacobi algorithm
    let U = decomp[0];
    let s = decomp[1]; // singluar values
    let Vh = decomp[2];

    let Sinv = this.matMake(s.length, s.length, 0.0);
    for (let i = 0; i "lt" Sinv.length; ++i)
      Sinv[i][i] = 1.0 / s[i];  // SVD fails if any s is 0.0

    let V = this.matTranspose(Vh);
    let Uh = this.matTranspose(U);
    let tmp = this.matMult(Sinv, Uh);
    let result = this.matMult(V, tmp);
    return result;  
  }

  // --------------------------------------------------------

  svdJacobi(M)
  {
    // A = U * S * Vh
    // returns an array where result[0] = U matrix,
    //  result[1] = s vector, result[2] = Vh matrix

    // see github.com/ampl/gsl/blob/master/linalg/svd.c
    // numpy svd() full_matrices = False version

    let DBL_EPSILON = 1.0e-15;

    let A = this.matCopy(M); // working U
    let m = A.length;
    let n = A[0].length;
    let Q = this.matIdentity(n); // working V
    let t = this.vecMake(n, 0.0);  // working s

    // initialize the rotation and sweep counters
    let count = 1;
    let sweep = 0;

    let tolerance = 10 * m * DBL_EPSILON; // heuristic

    // always do at least 12 sweeps
    let sweepmax = Math.max(5 * n, 12); // heuristic

    // store the column error estimates in t, for use
    // during orthogonalization

    for (let j = 0; j "lt" n; ++j) {
      let cj = this.matGetColumn(A, j);
      let sj = this.vecNorm(cj);
      t[j] = DBL_EPSILON * sj;
    }

    // orthogonalize A by plane rotations
    while (count "gt" 0 && sweep "lte" sweepmax) {
      // initialize rotation counter
      count = n * (n - 1) / 2;

      for (let j = 0; j "lt" n - 1; ++j) {
        for (let k = j + 1; k "lt" n; ++k) {
          let cosine = 0.0; let sine = 0.0;

          let cj = this.matGetColumn(A, j);
          let ck = this.matGetColumn(A, k);

          let p = 2.0 * this.vecDot(cj, ck);
          let a = this.vecNorm(cj);
          let b = this.vecNorm(ck);

          let q = a * a - b * b;
          let v = this.hypot(p, q);

          // test for columns j,k orthogonal,
          // or dominant errors 
          let abserr_a = t[j];
          let abserr_b = t[k];

          let sorted = (a "gte" b);
          let orthog = (Math.abs(p) "lte" 
            tolerance * (a * b));
          let noisya = (a "lt" abserr_a);
          let noisyb = (b "lt" abserr_b);

          if (sorted && (orthog || noisya ||
            noisyb)) {
            --count;
            continue;
          }

          // calculate rotation angles
          if (v == 0 || !sorted) {
            cosine = 0.0;
            sine = 1.0;
          }
          else {
            cosine = Math.sqrt((v + q) / (2.0 * v));
            sine = p / (2.0 * v * cosine);
          }

          // apply rotation to A (U)
          for (let i = 0; i "lt" m; ++i) {
            let Aik = A[i][k];
            let Aij = A[i][j];
            A[i][j] = Aij * cosine + Aik * sine;
            A[i][k] = -Aij * sine + Aik * cosine;
          }

          // update singular values
          t[j] = Math.abs(cosine) * abserr_a +
            Math.abs(sine) * abserr_b;
          t[k] = Math.abs(sine) * abserr_a +
            Math.abs(cosine) * abserr_b;

          // apply rotation to Q (Vh)
          for (let i = 0; i "lt" n; ++i) {
            let Qij = Q[i][j];
            let Qik = Q[i][k];
            Q[i][j] = Qij * cosine + Qik * sine;
            Q[i][k] = -Qij * sine + Qik * cosine;
          } // i
        } // k
      } // j

      ++sweep;
    } // while

    //  compute singular values
    let prevNorm = -1.0;

    for (let j = 0; j "lt" n; ++j) {
      let column = this.matGetColumn(A, j);
      let norm = this.vecNorm(column);

      // determine if singular value is zero
      if (norm == 0.0 || prevNorm == 0.0
        || (j "gt" 0 && norm "lte" tolerance * prevNorm)) {
        t[j] = 0.0;
        for (let i = 0; i "lt" m; ++i)
          A[i][j] = 0.0;
        prevNorm = 0.0;
      }
      else {
        t[j] = norm;
        for (let i = 0; i "lt" m; ++i)
          A[i][j] = A[i][j] * 1.0 / norm;
        prevNorm = norm;
      }
    }

    if (count "gt" 0) {
      console.log("Jacobi iterations did not" +
        " converge");
    }

    let U = A;
    let s = t;
    let Vh = this.matTranspose(Q);

    if (m "lt" n) {
      U = this.matExtractFirstColumns(U, m);
      s = this.vecExtractFirst(s, m);
      Vh = this.matExtractFirstRows(Vh, m);
    } 

    let result = [];
    result[0] = U;
    result[1] = s;
    result[2] = Vh;

    return result;
  } // svdJacobi()

  // === minor helper functions =============================
  //
  // matMake, matCopy, matIdentity, matGetColumn,
  // matExtractFirstColumns, matExtractFirstRows,
  // matTranspose, matMult, vecMake, vecNorm, vecDot,
  // hypot, matShow, vecShow, matAreEqual, matLoad
  //
  // all of these could be declared 'static'
  // ========================================================

  matMake(nRows, nCols, val)
  {
    let result = [];
    for (let i = 0; i "lt" nRows; ++i) {
      result[i] = [];
      for (let j = 0; j "lt" nCols; ++j) {
        result[i][j] = val;
      }
    }
    return result;
  }

  // --------------------------------------------------------

  matCopy(M)
  {
    let nRows = M.length;
    let nCols = M[0].length;
    let result = this.matMake(nRows, nCols, 0.0);
    for (let i = 0; i "lt" nRows; ++i)
      for (let j = 0; j "lt" nCols; ++j)
        result[i][j] = M[i][j];
    return result;
  }

  // --------------------------------------------------------

  matIdentity(n)
  {
    let result = this.matMake(n, n, 0.0);
    for (let i = 0; i "lt" n; ++i)
      result[i][i] = 1.0;
    return result;
  }

  // --------------------------------------------------------

  matGetColumn(M, j)
  {
    let nRows = M.length;
    let result = this.vecMake(nRows, 0.0);
    for (let i = 0; i "lt" nRows; ++i)
      result[i] = M[i][j];
    return result;
  }

  // --------------------------------------------------------

  matExtractFirstColumns(M, nc)
  {
    let nRows = M.length;
    let result = this.matMake(nRows, nc);
    for (let i = 0; i "lt" nRows; ++i)
      for (let j = 0; j "lt" nc; ++j)
        result[i][j] = M[i][j];
    return result;
  }

  // --------------------------------------------------------

  matExtractFirstRows(M, nr)
  {
    let nCols = M[0].length;
    let result = this.matMake(nr, nCols);
    for (let i = 0; i "lt" nr; ++i)
      for (let j = 0; j "lt" nCols; ++j)
        result[i][j] = M[i][j];
    return result;
  }

  // --------------------------------------------------------

  matTranspose(M)
  {
    let nRows = M.length;
    let nCols = M[0].length;
    let result = this.matMake(nCols, nRows, 0.0);
    for (let i = 0; i "lt" nRows; ++i)
      for (let j = 0; j "lt" nCols; ++j)
        result[j][i] = M[i][j];
    return result;  
  }

  // --------------------------------------------------------

  matMult(matA, matB)
  {
    let aRows = matA.length;
    let aCols = matA[0].length;
    let bRows = matB.length;
    let bCols = matB[0].length;

    let result = this.matMake(aRows, bCols, 0.0);

    for (let i = 0; i "lt" aRows; ++i)
      for (let j = 0; j "lt" bCols; ++j)
        for (let k = 0; k "lt" aCols; ++k) 
          result[i][j] += matA[i][k] * matB[k][j];

    return result;
  }

  // --------------------------------------------------------

  vecMake(n, val)
  {
    let result = [];
    for (let i = 0; i "lt" n; ++i) {
      result[i] = val;
    }
    return result;
  }

  // --------------------------------------------------------

  vecNorm(vec)
  {
    let sum = 0.0;
    let n = vec.length;
    for (let i = 0; i "lt" n; ++i)
      sum += vec[i] * vec[i];
    return Math.sqrt(sum);
  }

  // --------------------------------------------------------

  vecDot(v1, v2)
  {
    let n = v1.length;  // assume len(v1) == len(v2)
    let sum = 0.0;
    for (let i = 0; i "lt" n; ++i)
      sum += v1[i] * v2[i];
    return sum;
  }

  // --------------------------------------------------------

  hypot(a, b)
  {
    // wacky sqrt(a^2 + b^2) used by numerical implementations
    let xabs = Math.abs(a); let yabs = Math.abs(b);
    let min, max;

    if (xabs "lt" yabs) {
      min = xabs; max = yabs;
    }
    else {
      min = yabs; max = xabs;
    }

    if (min == 0)
      return max;
    else {
      let u = min / max;
      return max * Math.sqrt(1 + u * u);
    }
  }

  // --------------------------------------------------------
  // helpers for debugging: matShow, vecShow
  // --------------------------------------------------------

  matShow(M, dec, wid) // for debugging
  {
    let small = 1.0 / Math.pow(10, dec);
    let nr = M.length;
    let nc = M[0].length;
    for (let i = 0; i "lt" nr; ++i) {
      for (let j = 0; j "lt" nc; ++j) {
        let x = M[i][j];
        if (Math.abs(x) "lt" small) x = 0.0;
        let xx = x.toFixed(dec);
        let s = xx.toString().padStart(wid, ' ');
        process.stdout.write(s);
        process.stdout.write(" ");
      }
      process.stdout.write("\n");
    }
  }

  // --------------------------------------------------------

  vecShow(vec, dec, wid)  // for debugging
  {
    let small = 1.0 / Math.pow(10, dec);
    for (let i = 0; i "lt" vec.length; ++i) {
      let x = vec[i];
      if (Math.abs(x) "lt" small) x = 0.0  // avoid -0.00
      let xx = x.toFixed(dec);
      let s = xx.toString().padStart(wid, ' ');
      process.stdout.write(s);
      process.stdout.write(" ");
    }
    process.stdout.write("\n");
  }

} // end class LinearRegressor

// ----------------------------------------------------------
// helper functions for main()
// matShow, vecShow, matLoad, matToVec
// ----------------------------------------------------------

function vecShow(vec, dec, wid, nl)
{
  let small = 1.0 / Math.pow(10, dec);
  for (let i = 0; i "lt" vec.length; ++i) {
    let x = vec[i];
    if (Math.abs(x) "lt" small) x = 0.0  // avoid -0.00
    let xx = x.toFixed(dec);
    let s = xx.toString().padStart(wid, ' ');
    process.stdout.write(s);
    process.stdout.write(" ");
  }

  if (nl == true)
    process.stdout.write("\n");
}

// ----------------------------------------------------------

function matShow(A, dec, wid)
{
  let small = 1.0 / Math.pow(10, dec);
  let nr = A.length;
  let nc = A[0].length;
  for (let i = 0; i "lt" nr; ++i) {
    for (let j = 0; j "lt" nc; ++j) {
      let x = A[i][j];
      if (Math.abs(x) "lt" small) x = 0.0;
      let xx = x.toFixed(dec);
      let s = xx.toString().padStart(wid, ' ');
      process.stdout.write(s);
      process.stdout.write(" ");
    }
    process.stdout.write("\n");
  }
}

// ----------------------------------------------------------

function matToVec(M)
{
  let nr = M.length;
  let nc = M[0].length;
  let result = [];
  for (let i = 0; i "lt" nr*nc; ++i) {  // vecMake(r*c, 0.0);
    result[i] = 0.0;
  }
  let k = 0;
  for (let i = 0; i "lt" nr; ++i) {
    for (let j = 0; j "lt" nc; ++j) {
      result[k++] = M[i][j];
    }
  }
  return result;
}

// ----------------------------------------------------------

function matLoad(fn, delimit, usecols, comment)
{
  // efficient but mildly complicated
  let all = FS.readFileSync(fn, "utf8");  // giant string
  all = all.trim();  // strip final crlf in file
  let lines = all.split("\n");  // array of lines

  // count number non-comment lines
  let nRows = 0;
  for (let i = 0; i "lt" lines.length; ++i) {
    if (!lines[i].startsWith(comment))
      ++nRows;
  }
  let nCols = usecols.length;
  // let result = matMake(nRows, nCols, 0.0); 
  let result = [];
  for (let i = 0; i "lt" nRows; ++i) {
    result[i] = [];
    for (let j = 0; j "lt" nCols; ++j) {
      result[i][j] = 0.0;
    }
  }
 
  let r = 0;  // into lines
  let i = 0;  // into result[][]
  while (r "lt" lines.length) {
    if (lines[r].startsWith(comment)) {
      ++r;  // next row
    }
    else {
      let tokens = lines[r].split(delimit);
      for (let j = 0; j "lt" nCols; ++j) {
        result[i][j] = parseFloat(tokens[usecols[j]]);
      }
      ++r;
      ++i;
    }
  }

  return result;
}

// ----------------------------------------------------------

function main()
{
  console.log("\nLinear regression with " +
    "pseudo-inverse (SVD-Jacobi) using JavaScript ");

  // 1. load data
  console.log("\nLoading synthetic train (200) and" +
    " test (40) data from file ");

  let trainFile = ".\\Data\\synthetic_train_200.txt";
  let trainX = matLoad(trainFile, ",", [0,1,2,3,4], "#");
  let trainY = matLoad(trainFile, ",", [5], "#");
  trainY = matToVec(trainY);
  
  let testFile = ".\\Data\\synthetic_test_40.txt";
  let testX = matLoad(testFile, ",", [0,1,2,3,4], "#");
  let testY = matLoad(testFile, ",", [5], "#");
  testY = matToVec(testY);

  console.log("\nFirst three train X: ");
  for (let i = 0; i "lt" 3; ++i)
    vecShow(trainX[i], 4, 8, true);

  console.log("\nFirst three train y: ");
  for (let i = 0; i "lt" 3; ++i)
    console.log(trainY[i].toFixed(4).toString().
    padStart(9, ' '));

  // 2. create and train linear regression model
  let seed = 0;  // not used this version
  console.log("\nCreating and training model ");
  let model = new LinearRegressor(seed);
  model.train(trainX, trainY);
  console.log("Done ");

  console.log("\nModel weights/coefficients: ");
  vecShow(model.weights, 4, 8, true);
  console.log("Model bias/intercept: " + 
    model.bias.toFixed(4).toString());

  // 3. evaluate
  console.log("\nEvaluating model ");
  let trainAcc = model.accuracy(trainX, trainY, 0.10);
  let testAcc = model.accuracy(testX, testY, 0.10);

  console.log("\nTrain acc (within 0.10) = " +
    trainAcc.toFixed(4).toString());
  console.log("Test acc (within 0.10) = " +
    testAcc.toFixed(4).toString());

  let trainMSE = model.mse(trainX, trainY);
  let testMSE = model.mse(testX, testY);

  console.log("\nTrain MSE = " +
    trainMSE.toFixed(4).toString());
  console.log("Test MSE = " +
    testMSE.toFixed(4).toString());

  // 4. use model
  let x = trainX[0];
  console.log("\nPredicting for x = ");
  vecShow(x, 4, 9, true);  // add newline

  let predY = model.predict(x);
  console.log("Predicted y = " + 
    predY.toFixed(4).toString());

  console.log("\nEnd demo");
}

main();

Training data:

# synthetic_train_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402

Test data:

# synthetic_test_40.txt
#
 0.7462,  0.4006, -0.0590,  0.6543, -0.0083,  0.1935
 0.8495, -0.2260, -0.0142, -0.4911,  0.7699,  0.1078
-0.2335, -0.4049,  0.4352, -0.6183, -0.7636,  0.5088
 0.1810, -0.5142,  0.2465,  0.2767, -0.3449,  0.3136
-0.8650,  0.7611, -0.0801,  0.5277, -0.4922,  0.7140
-0.2358, -0.7466, -0.5115, -0.8413, -0.3943,  0.4533
 0.4834,  0.2300,  0.3448, -0.9832,  0.3568,  0.1360
-0.6502, -0.6300,  0.6885,  0.9652,  0.8275,  0.3046
-0.3053,  0.5604,  0.0929,  0.6329, -0.0325,  0.4756
-0.7995,  0.0740, -0.2680,  0.2086,  0.9176,  0.4565
-0.2144, -0.2141,  0.5813,  0.2902, -0.2122,  0.4119
-0.7278, -0.0987, -0.3312, -0.5641,  0.8515,  0.4438
 0.3793,  0.1976,  0.4933,  0.0839,  0.4011,  0.1905
-0.8568,  0.9573, -0.5272,  0.3212, -0.8207,  0.7415
-0.5785,  0.0056, -0.7901, -0.2223,  0.0760,  0.5551
 0.0735, -0.2188,  0.3925,  0.3570,  0.3746,  0.2191
 0.1230, -0.2838,  0.2262,  0.8715,  0.1938,  0.2878
 0.4792, -0.9248,  0.5295,  0.0366, -0.9894,  0.3149
-0.4456,  0.0697,  0.5359, -0.8938,  0.0981,  0.3879
 0.8629, -0.8505, -0.4464,  0.8385,  0.5300,  0.1769
 0.1995,  0.6659,  0.7921,  0.9454,  0.9970,  0.2330
-0.0249, -0.3066, -0.2927, -0.4923,  0.8220,  0.2437
 0.4513, -0.9481, -0.0770, -0.4374, -0.9421,  0.2879
-0.3405,  0.5931, -0.3507, -0.3842,  0.8562,  0.3987
 0.9538,  0.0471,  0.9039,  0.7760,  0.0361,  0.1706
-0.0887,  0.2104,  0.9808,  0.5478, -0.3314,  0.4128
-0.8220, -0.6302,  0.0537, -0.1658,  0.6013,  0.4306
-0.4123, -0.2880,  0.9074, -0.0461, -0.4435,  0.5144
 0.0060,  0.2867, -0.7775,  0.5161,  0.7039,  0.3599
-0.7968, -0.5484,  0.9426, -0.4308,  0.8148,  0.2979
 0.7811,  0.8450, -0.6877,  0.7594,  0.2640,  0.2362
-0.6802, -0.1113, -0.8325, -0.6694, -0.6056,  0.6544
 0.3821,  0.1476,  0.7466, -0.5107,  0.2592,  0.1648
 0.7265,  0.9683, -0.9803, -0.4943, -0.5523,  0.2454
-0.9049, -0.9797, -0.0196, -0.9090, -0.4433,  0.6447
-0.4607,  0.1811, -0.2389,  0.4050, -0.0078,  0.5229
 0.2664, -0.2932, -0.4259, -0.7336,  0.8742,  0.1834
-0.4507,  0.1029, -0.6294, -0.1158, -0.6294,  0.6081
 0.8948, -0.0124,  0.9278,  0.2899, -0.0314,  0.1534
-0.1323, -0.8813, -0.0146, -0.0697,  0.6135,  0.2386
Posted in JavaScript, Machine Learning | Leave a comment

“Quadratic Regression with SGD Training Using C#” in Visual Studio Magazine

I wrote an article titled “Quadratic Regression with SGD Training Using C#” in the January 2026 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2026/01/21/quadratic-regression-with-sgd-training-using-csharp.aspx.

The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict an employee’s salary based on age, height, high school grade point average, and so on. There are approximately a dozen common regression techniques. The most basic technique is called linear regression, or sometimes multiple linear regression, where the “multiple” indicates two or more predictor variables.

The form of a basic linear regression prediction model is y’ = (w0 * x0) + (w1 * x1) + . . + (wn * xn) + b, where y’ is the predicted value, the xi are predictor values, the wi are weights, and b is the bias. Quadratic regression extends linear regression. The form of a quadratic regression model is y’ = (w0 * x0) + . . + (wn * xn) + (wj * x0 * x0) + . . + (wk * x0 * x1) + . . . + b. There are derived predictors that are the square of each original predictor, and interaction terms that are the multiplication product of all possible pairs of original predictors.

Compared to basic linear regression, quadratic regression can handle more complex data. Compared to the most powerful regression techniques such as neural network regression, quadratic regression often has slightly worse prediction accuracy, but has much better model interpretability.

There are several ways to train a quadratic regression model, including stochastic gradient descent (SGD), pseudo-inverse training, closed form inverse training, L-BFGS optimization training and so on. The demo program uses SGD training, which is iterative and requires a learning rate and a maximum number of epochs. These two parameter values must be determined by trial and error.

The output of the demo program is:

Begin C# quadratic regression with SGD training

Loading synthetic train (200) and test (40) data
Done

First three train X:
 -0.1660  0.4406 -0.9998 -0.3953 -0.7065
  0.0776 -0.1616  0.3704 -0.5911  0.7562
 -0.9452  0.3409 -0.1654  0.1174 -0.7192

First three train y:
  0.4840
  0.1568
  0.8054

Creating quadratic regression model

Setting lrnRate = 0.001
Setting maxEpochs = 1000

Starting SGD training
epoch =     0  MSE =   0.0957
epoch =   200  MSE =   0.0003
epoch =   400  MSE =   0.0003
epoch =   600  MSE =   0.0003
epoch =   800  MSE =   0.0003
Done

Model base weights:
 -0.2630  0.0354 -0.0420  0.0341 -0.1124

Model quadratic weights:
  0.0655  0.0194  0.0051  0.0047  0.0243

Model interaction weights:
  0.0043  0.0249  0.0071  0.1081 -0.0012 -0.0093
  0.0362  0.0085 -0.0568  0.0016

Model bias/intercept:   0.3220

Evaluating model
Accuracy train (within 0.10) = 0.8850
Accuracy test (within 0.10) = 0.9250

MSE train = 0.0003
MSE test = 0.0005

Predicting for x =
  -0.1660   0.4406  -0.9998  -0.3953  -0.7065

Predicted y = 0.4843

End demo

Suppose, as in the demo data, there are five predictors, aka features, (x0, x1, x2, x3, x4). The prediction equation for basic linear regression is:

y' = (w0 * x0) + (w1 * x1) + (w2 * x2) + (w3 * x3) + (w4 * x4) + b

The wi are model weights (aka coefficients), and b is the model bias (aka intercept). The values of the weights and the bias must be determined by training, so that predicted y’ values are close to the known, correct y values in a set of training data.

Basic linear regression is simple but it can’t predict well for data that has an underlying non-linear structure, and basic linear regression can’t deal with data that has hidden interactions between the xi predictors.

The prediction equation for quadratic regression with five predictors is:

y' = (w0 * x0) + (w1 * x1) + (w2 * x2) + (w3 * x3) +
     (w4 * x4) + 

     (w5 * x0*x0) + (w6 * x1*x1) + (w7 * x2*x2) +
     (w8 * x3*x3) + (w9 * x4*x4) + 

     (w10 * x0*x1) + (w11 * x0*x2) + (w12 * x0*x3) +
     (w13 * x0*x4) + (w14 * x1*x2) + (w15 * x1*x3) +
     (w16 * x1*x4) + (w17 * x2*x3) + (w18 * x2*x4) + 
     (w19 * x3*x4)

     + b

The squared (aka “quadratic”) xi^2 terms handle non-linear structure. If there are n predictors, there are also n squared terms. The xi * xj terms between all possible pairs pf original predictors handle interactions between predictors. If there are n predictors, there (n * (n-1)) / 2 interaction terms.

Quadratic regression has a nice balance of prediction power and interpretability. The model weights/coefficients are easy to interpret. If the predictor values have been normalized to the same scale, larger magnitudes mean larger effect, and the sign of the weights indicate the direction of the effect.



Quadratic regression is a classical machine learning technique that still has a lot of appeal. Classical science fiction magazines often featured covers with giant insects. Here are three with giant ants. Left: “Amazing Stories”, Fall 1928. Center: “Thrilling Wonder Stories”, December 1938. Right: The German “Utopia” was a series of magazines / short novels, published every other week, from 1953 to 1968. This is #192 from September 15, 1959.


Posted in Machine Learning | Leave a comment

NFL 2025 Season Super Bowl LX Prediction – Zoltar Says the Seahawks will Beat the Patriots by 3 Points with a Score of 27-24

Zoltar is my NFL football prediction computer program. It uses a neural network and a type of reinforcement learning. Here is Zoltar’s prediction for week #22 (Super Bowl LX) of the 2025 season.

Zoltar:    seahawks  by    3  opp =    patriots    | Vegas:    seahawks  by  4.5

Zoltar’s prediction that the Seahawks will win by 3 point is very close to the Las Vegas point spread which has the Seahawk listed as favorites by 4.5 points.

Zoltar theoretically suggests betting when the Vegas line is “significantly” different from Zoltar’s prediction. During the beginning and end of the season, I use a conservative threshold of 4.0 points difference. During the middle part of the season I use a more aggressive threshold of 3.0 points difference.

For Super Bowl LX (week #22), because Zoltar agrees to within 1.5 points of the Vegas point spread, Zoltar has no wager recommendation. But if forced to make a suggestion, Zoltar would advise betting on the underdog Patriots, thinking that the Seahawks will win but not by more than 4.5 points.

A bet on the Patriots will pay off if the Patriots win by any score, or if the Seahawks win by less than 4.5 points (i.e., 4 points or less).

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better, to take into account logistics and things like manual data entry errors.

In week #21, against the Vegas point spread, Zoltar went 2-0 because the Broncos lost to the Patriots but not by more than 5.5 points, and the Seahawks beat he Rams by more than 2.5 points.

For the season, against the spread, pending the result of the Super Bowl, Zoltar is 50-30 (~62% accuracy).

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting. In week #21, just predicting the winning team, Zoltar went 1-1. Vegas went 2-0 at just predicting the winning teams.



I’m old enough to have seen every Super Bowl. Here are three I remember well.

Left: I watched Super Bowl I in 1967 when the Green Bay Packers beat the Kansas City Chiefs 35-10. I was attending at Servite High School in Anaheim, Calif. At that time the Super Bowl wasn’t a big event like it is now, but the game still attracted a lot of attention. The game was televised in color, which was still relatively rare in 1967. My family had a 25″ black and white Zenith TV. We got a color TV the next year.

Center: I grew up in Southern California and was therefore a Los Angeles Rams fan in the 1960s. But I had to wait until 2000 to see the Rams beat the Tennessee Titans 23-16 in Super Bowl XXXIV. The game featured rookie Kurt Warner who was never even drafted, and a bunch of “.com” TV ads, and a final play where a Titan receiver was tackled on the one-yard line to save the game for the Rams.

Right: I saw Super Bowl XLII in 2008 when the huge underdog New York Giants beat the then-undefeated New England Patriots 17-14. I saw this game in Las Vegas on huge TV screens in an auditorium at the MGM Grand Hotel, with three work colleagues – Patrick W., Rob D., and Brett O. The game featured an unbelievable “helmet catch”. It was an incredibly exciting experience in every way.


Posted in Zoltar | Leave a comment

A Relatively Simple Approximation to Singular Value Decomposition Using QR Decomposition With Python

Before I start: The technique presented in this post is only an investigation to compute SVD. The algorithm is (possibly) not numerically stable, and is (probably) not suitable when high accuracy (beyond 8 decimals) is needed. And the algorithm is very slow compared to standard SVD algorithms. That said, for small matrices, the technique for SVD presented here is the simplest I’ve seen.

The starting point of the problem is singular value decomposition (SVD). If you have a matrix A and decompose it using SVD, you get a matrix U, a vector s, and a matrix Vh, where A = U * S * Vh. The S is a square matrix with the s values on the diagonal. SVD has many important uses in machine learning, for example, computing a pseudo-inverse for training a linear model.

Unfortunately, accurately computing SVD is mind-numbingly difficult. Two common SVD algorithms are Jacobi and Golub-Kahan-Reinsch. Both algorithms are very complex. But it’s possible to compute an approximation to SVD. Briefly, to approximate the SVD of a source matrix A, first compute the QR decomposition of A to get Q and R, and then compute the SVD of the R matrix.

This seems silly at first — using SVD to compute SVD — but the R result matrix of a QR decomposition is a square matrix, and it’s possible to find its SVD using QR in a relatively simple way . In Python code:

def my_svd(A):
  # simple approximation using QR for tall skinny A
  Q, R = my_qr(A, mode='reduced')
  U, s, Vh = my_svd_square(R)
  U = np.matmul(Q, U)
  return U, s, Vh

The my_svd() function isn’t trivial, but it’s vastly less complicated than computing SVD using Jacobi or any other algorithm. The my_svd_square() helper function for a square matrix is also much simpler than computing SVD for a non-square matrix. In essence, the idea for approximating SVD is to break it down into two much simpler (but by no means easy) sub-problems that can be solved using QR decomposition.

The first part of the output of a demo:

Simple SVD approximation using simple QR

Source matrix A:
[[ 1.000000  6.000000  8.000000  5.000000]
 [ 0.000000  2.000000  4.000000  6.000000]
 [ 9.000000  3.000000  7.000000  7.000000]
 [-1.000000 -3.000000 -5.000000 -7.000000]
 [ 7.000000  5.000000  3.000000  1.000000]]

=================================

Using simple approximation:
Exited at iteration 19

U =
[[ 0.490569  0.357946 -0.658897 -0.443928]
 [ 0.309393  0.418534  0.228973  0.339519]
 [ 0.618658 -0.402788  0.526177 -0.422092]
 [-0.402797 -0.389987 -0.198233 -0.465342]
 [ 0.344434 -0.618365 -0.444148  0.541249]]

s =
[21.095067  8.254694  4.718203  1.611445]

Vh =
[[ 0.420588  0.395767  0.594453  0.559554]
 [-0.872923 -0.017625  0.219636  0.435263]
 [ 0.247105 -0.750910 -0.214769  0.573539]
 [ 0.007029  0.528386 -0.743142  0.410485]]

Check =
[[ 1.000000  6.000000  8.000000  5.000000]
 [ 0.000000  2.000000  4.000000  6.000000]
 [ 9.000000  3.000000  7.000000  7.000000]
 [-1.000000 -3.000000 -5.000000 -7.000000]
 [ 7.000000  5.000000  3.000000  1.000000]]
Pass

The Check matrix shows that U * S * Vh equals the source matrix A.

Here’s the second part of a demo output that shows SVD using the built-in NumPy np.linalg.svd() functions:

Using linalg.svd():

U =
[[-0.490569 -0.357946  0.658897 -0.443928]
 [-0.309393 -0.418534 -0.228973  0.339519]
 [-0.618658  0.402788 -0.526177 -0.422092]
 [ 0.402797  0.389987  0.198233 -0.465342]
 [-0.344434  0.618365  0.444148  0.541249]]

s =
[21.095067  8.254694  4.718203  1.611445]

Vh =
[[-0.420588 -0.395767 -0.594453 -0.559554]
 [ 0.872923  0.017625 -0.219636 -0.435263]
 [-0.247105  0.750910  0.214769 -0.573539]
 [ 0.007029  0.528386 -0.743142  0.410485]]

Check =
[[ 1.000000  6.000000  8.000000  5.000000]
 [-0.000000  2.000000  4.000000  6.000000]
 [ 9.000000  3.000000  7.000000  7.000000]
 [-1.000000 -3.000000 -5.000000 -7.000000]
 [ 7.000000  5.000000  3.000000  1.000000]]
Pass

End demo

The results are the same — almost. One of the hugely annoying details with SVD is that the sign of the values in the columns of U, and in the rows of Vh are arbitrary. Singular values by convention are always positive or zero.


Note: The standard way to compute the singular value decomposition (SVD) of a square matrix A using the QR decomposition is via an iterative process based on the QR algorithm, specifically the Golub-Kahan-Reinsch algorithm. The overall process involves two main phases:
1. Bidiagonalization: Reduce the original matrix A to a bidiagonal matrix B using Householder reflections.
2. SVD of Bidiagonal Matrix: Compute the SVD of the bidiagonal matrix B using an iterative QR-based method.
My demo technique skips the bidiagonalization step.


I tested the demo code, but not thoroughly — 1,000 matrices with shape 7×5, with random values between 10.0 and +10.0, and it passed those 1,000 tests.

I ran some test with random-size matrices, up to 100×100, and although the tests all passed, performance was very slow — making the approximation technique not practical for most large-matrix scenarios.

Well, all in all, this was a very challenging and very interesting exploration.



Actor Richard Carlson (1912-1977) starred in several of my favorite old science fiction movies. Old movies often had to approximate various special effects, but that doesn’t detract from their appeal to me. Here are three, all from the year 1953.

Left: In “The Magnetic Monster”, a rogue scientist creates a new element that feeds on energy and doubles its mass every 11 hours. Unless it’s stopped, it will literally consume the Earth. Carlson plays Dr. Jeffrey Stewart who devises a plan to stop the menace.

Center: In “The Maze”, Carlson plays Gerald MacTeam. He inherits a creepy old Scottish castle with weird happenings at night. It turns out that the hidden master of the castle is a human-sized frog creature whose development stalled as an embryo. What the heck?! The story has a happy ending (well, except for the frog-man).

Right: In “It Came From Outer Space”, Carlson plays astronomer John Putnam. He witnesses an object crashing into the desert. It turns out that it’s an alien spaceship. The aliens can morph and they impersonate townspeople to get supplies to repair their ship. The story ends well for everyone.


Demo program:

# svd_approx_using_qr.py

# approximating svd using qr
# possibly not numerically stable. for fun only

import numpy as np
np.set_printoptions(precision=6, suppress=True,
  floatmode='fixed')

# -----------------------------------------------------------

def my_qr(A, mode):
  # QR decomposition from scratch
  # mode = 'complete' or 'reduced'
  m, n = A.shape  # tall skinny A only!
  k = np.minimum(m,n)
  R = A.copy()
  Q = np.eye(m)

  if m == n: end = n-1
  else: end = n
  for i in range(end):
    H = np.eye(m)

    # Householder routine follows
    a = R[i:, i]
    norm_a = np.copysign(np.linalg.norm(a), a[0])
    v = a / (a[0] + norm_a)
    v[0] = 1.0
    h = np.eye(len(a))
    V = np.matmul(v[:, None], v[None, :])
    h -= (2.0 / np.dot(v, v)) * V

    H[i:, i:] = h  # copy h into lower right
    Q = np.matmul(Q, H)
    R = np.matmul(H, R)

  # 'reduced' : returns Q, R with dims (m, k), (k, n)
  if mode == 'complete':
    return Q, R
  elif mode == 'reduced':
    return Q[0:m,0:k], R[0:k,0:n]

# -----------------------------------------------------------

def my_svd_square(A):
  # helper for my_svd()
  # square matrix only (such as R from QR decomp reduced)
  max_iter = 1000
  R1 = np.copy(A);  # diag holds s
  U = np.eye(len(A));
  V = np.eye(len(A))
  for i in range(max_iter):
    # exit when off-diag cells of R1 are all close to 0
    # could check just every 100 iterations to improve perf
    if np.allclose(R1, np.diag(np.diag(R1)), \
      atol=1.0e-8) == True: 
      print("Exited at iteration " + str(i))
      break
    Q1, R1 = my_qr(R1, mode='reduced')
    Q2, R2 = my_qr(R1.T, mode='reduced')
    R1 = R2.T
    U = np.matmul(U, Q1)
    V = np.matmul(V, Q2)
  if i == max_iter-1:
    print("Warn: Exceeded max iterations ")
  # deal with singular values computed as negative
  s = np.copy(np.diag(R1))
  Vh = V.T
  for i in range(len(s)):
    if s[i] "lt" 0.0:
      s[i] = -s[i]
      Vh[i,:] *= -1
  return U, s, Vh

# -----------------------------------------------------------
# -----------------------------------------------------------

def my_svd(A):
  # simple approximation using QR for tall skinny A
  Q, R = my_qr(A, mode='reduced')
  U, s, Vh = my_svd_square(R)
  U = np.matmul(Q, U)
  return U, s, Vh

# -----------------------------------------------------------

print("\nSimple SVD approximation using simple QR ")

A = np.array([
 [1, 6, 8, 5],
 [0, 2, 4, 6],
 [9, 3, 7, 7],
 [-1, -3, -5, -7],
 [7, 5, 3, 1]], dtype=np.float64)

print("\nSource matrix A: ")
print(A)

print("\n================================= ")

print("\nUsing simple approximation: ")
U, s, Vh = my_svd(A)

print("\nU = "); print(U)
print("\ns = "); print(s)
print("\nVh = "); print(Vh)

C = np.matmul(U, np.matmul(np.diag(s), Vh))
print("\nCheck = ")
print(C)
# if np.allclose(A, C) == True:
#   print("Pass ")

print("\n================================= ")

print("\nUsing linalg.svd(): ")
U, s, Vh = np.linalg.svd(A, full_matrices=False)
print("\nU = "); print(U)
print("\ns = "); print(s)
print("\nVh = "); print(Vh)

C = np.matmul(U, np.matmul(np.diag(s), Vh))
print("\nCheck = ")
print(C)
# if np.allclose(A, C) == True:
#   print("Pass ")

print("\nEnd demo ")
Posted in Machine Learning | Leave a comment

Decision Tree Regression From-Scratch C#, No Pointers, No Recursion

Decision tree regression is a machine learning technique that incorporates a set of if-then rules in a tree data structure to predict a single numeric value. For example, a decision tree regression model prediction might be, “If employee age is greater than 39.0 and age is less than or equal to 42.5 and years-experience is less than or equal to 8.0 and height is greater than 64.0 then bank account balance is $754.14.”

This blog post presents decision tree regression, implemented from scratch with C#, using list storage for tree nodes (no pointers/references), a FIFO stack to build the tree (no recursion), weighted variance minimization for the node split function, and saves associated row indices in each node.

The demo data looks like:

 0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
. . .

The first five values on each line are the x predictors. The last value on each line is the target y variable to predict. The data is synthetic and was generated by a 5-10-1 neural network with random weights and bias. Therefore, the data has an underlying structure which can be predicted. There are 200 training items and 40 test items.

The key parts of the output of the demo program (in slightly edited form to save space) are:

Begin decision tree regression

Loading synthetic train (200) and test (40) data
Done

First three train X:
 -0.1660  0.4406 -0.9998 -0.3953 -0.7065
  0.0776 -0.1616  0.3704 -0.5911  0.7562
 -0.9452  0.3409 -0.1654  0.1174 -0.7192

First three train y:
  0.4840
  0.1568
  0.8054

Setting maxDepth = 3
Setting minSamples = 2
Setting minLeaf = 18
Using default numSplitCols = -1

Creating and training tree
Done

Tree:
ID = 0  | col =  0 | -0.2102 | left =  1 | right =  2 | val = 0.3493 | F
ID = 1  | col =  4 |  0.1431 | left =  3 | right =  4 | val = 0.5345 | F
ID = 2  | col =  0 |  0.3915 | left =  5 | right =  6 | val = 0.2382 | F
ID = 3  | col =  0 | -0.6553 | left =  7 | right =  8 | val = 0.6358 | F
ID = 4  | col = -1 |  0.0000 | left =  9 | right = 10 | val = 0.4123 | T
ID = 5  | col =  4 | -0.2987 | left = 11 | right = 12 | val = 0.3032 | F
ID = 6  | col =  2 |  0.3777 | left = 13 | right = 14 | val = 0.1701 | F
ID = 7  | col = -1 |  0.0000 | left = 15 | right = 16 | val = 0.6952 | T
ID = 8  | col = -1 |  0.0000 | left = 17 | right = 18 | val = 0.5598 | T
ID = 11 | col = -1 |  0.0000 | left = 23 | right = 24 | val = 0.4101 | T
ID = 12 | col = -1 |  0.0000 | left = 25 | right = 26 | val = 0.2613 | T
ID = 13 | col = -1 |  0.0000 | left = 27 | right = 28 | val = 0.1882 | T
ID = 14 | col = -1 |  0.0000 | left = 29 | right = 30 | val = 0.1381 | T 

Evaluating model
Accuracy train (within 0.10) = 0.3750
Accuracy test (within 0.10) = 0.4750

MSE train = 0.0048
MSE test = 0.0054

Predicting for trainX[0] =
  -0.1660   0.4406  -0.9998  -0.3953  -0.7065
Predicted y = 0.4101

IF
column 0  >   -0.2102 AND
column 0  <=   0.3915 AND
column 4  <=  -0.2987 AND
THEN
predicted = 0.4101

End demo

If you compare this output with the visual representation, most of the ideas of decision tree regression will become mostly clear:


Click to enlarge.

The demo decision tree has maxDepth = 3. This creates a tree with 15 nodes (some of which could be empty). In general, if a tree has maxDepth = n, then it has 2^(n+1) – 1 nodes. These tree nodes could be stored as C# program-defined Node objects and connected via pointers/references. Instead, the architecture used by the demo decision tree is to create a List collection with size/count = 15, where each item in the List is a Node object.

With this design, if the root node is at index [0], then the left child is at [1] and the right child is at [2]. In general, if a node is at index [i], then the left child is at index [2*i+1] and the right child is at index [2*i+2].

The tree splitting part of constructing a decision tree regression system is perhaps best understood by looking at a concrete example. The root node of the demo decision tree is associated with all 200 rows of the training data. The splitting process selects some of the 200 rows to be assigned to the left child node, and the remaining rows to be assigned to the right child node.

For simplicity, suppose that instead of the demo training data with 200 rows and 5 columns of predictors, a tree node is associated with just 5 rows of data, and the data has just 3 columns of predictors:

     X                    y
[0] 0.99  0.22  0.77     3.0
[1] 0.55  0.44  0.00     9.0
[2] 0.88  0.66  0.88     7.0
[3] 0.11  0.33  0.22     1.0
[4] 0.55  0.88  0.33     5.0
     [0]   [1]   [2]

The split algorithm will select some of the five rows to go to the left child and the remaining rows to go to the right child. The idea is to select the two sets of rows in a way so that their associated y values are close together. The demo implementation does this by finding the split that minimizes the weighted variance of the target y values.

The splitting algorithm examines every possible x value in every column. For each possible candidate split value (also called the threshold), the weighted variance of the y values of the resulting split is computed. For example, for x = 0.33 in column [1], the rows where the values in column [1] are less than or equal to 0.33 are rows [3] and [0] with associated y values (1.0, 3.0). The other rows are [1], [2], [4] with associated y values (9.0, 7.0, 5.0).

The variance of a set of y values is the average of the sum of the squared differences between the values and their mean (average).

For this proposed split, the variance of y values (1.0, 3.0) is computed as 1.00. The mean of (1.0, 3.0) = (1.0 + 3.0) / 2 = 2.0 and so the variance is ((3.0 – 2.0)^2 + (1.0 – 2.0)^2) / 2 = 1.00.

The mean of y values (9.0, 7.0, 5.0) = (9.0 + 7.0 + 5.0) / 3 = 7.0 and the variance is ((9.0 – 7.0)^2 + (7.0 -7.0)^2 + (5.0 – 7.0)^2) / 3 = 2.67.

Because the proposed split has different numbers of rows (two rows for the left child and three rows for the right child), the weighted variance of the proposed split is used and is ((2 * 1.00) + (3 * 2.67)) / 5 = 2.00.

This process is repeated for every x value in every column. The x value and its column that generate the smallest weighted variance define the split.

The demo does not use recursion to build the tree. Instead, the demo uses a stack data structure. In high-level pseudo-code:

create empty stack
push (root, depth)
while stack not empty
  pop (currNode, currDepth)
  if at maxDepth or not enough data then
    node is a leaf, predicted = avg of associated Y values
  compute split value and split column using the
  associated rows stored in currNode
  if unable to split then
    node is a leaf, predicted = avg of associated Y values
  compute left-rows, right-rows
  create and push (leftNode, currDepth+1)
  create and push (rightNode, currDepth+1)   
end-while

The build-tree algorithm is short but very tricky. The stack holds a tree node and its associated rows of the training data, and the current depth of the tree. The current depth of the tree is maintained so that the algorithm knows when maxDepth has been reached.

Decision trees are rarely used by themselves because they usually overfit the training data, and predict poorly on new, previously unseen data. Multiple decision trees are usually combined into an ensemble technique such as bagging trees (“bootstrap aggregation”), random forest regression, adaptive boosting regression, and gradient boosting regression.



Decision trees are not plants.

“Utopia Zukunftsromane” (and several spinoffs) is a German series of over 600 inexpensive science fiction novels that were published weekly between 1953 and 1968. Several of the novels featured bad plants.

Left: Issue #103 (October 14, 1957). Center: Issue #206 (January 12, 1960). Right: Issue #337 (July 10, 1962).

There is some contradictory information about the publication data of Issue #206 in online databases so the date listed here might not be correct, but it’s certainly close.

The thing on the cover of Issue 337 might not be a plant, but t looks like one to me. I was unable to find an online version of the text of the novel so I wasn’t able to find out.


Demo code. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols (my blog editor chokes on symbols).

using System;
using System.IO;
using System.Collections.Generic;

// full (not-sparse) List storage with indexes (no pointers)
// stack-based construction (no recursion)
// the Nodes store associated row idxs 

namespace DecisionTreeRegression
{
  internal class DecisionTreeRegressionProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin decision tree regression ");

      // 1. load data
      Console.WriteLine("\nLoading synthetic train" +
        " (200) and test (40) data");
      string trainFile = "..\\..\\..\\Data\\" +
        "synthetic_train_200.txt";
      int[] colsX = new int[] { 0, 1, 2, 3, 4 };
      double[][] trainX =
        MatLoad(trainFile, colsX, ',', "#");
      double[] trainY =
        MatToVec(MatLoad(trainFile,
        new int[] { 5 }, ',', "#"));

      string testFile = "..\\..\\..\\Data\\" +
        "synthetic_test_40.txt";
      double[][] testX =
        MatLoad(testFile, colsX, ',', "#");
      double[] testY =
        MatToVec(MatLoad(testFile,
        new int[] { 5 }, ',', "#"));
      Console.WriteLine("Done ");

      Console.WriteLine("\nFirst three train X: ");
      for (int i = 0; i "lt" 3; ++i)
        VecShow(trainX[i], 4, 8);

      Console.WriteLine("\nFirst three train y: ");
      for (int i = 0; i "lt" 3; ++i)
        Console.WriteLine(trainY[i].ToString("F4").
          PadLeft(8));

      // 2. create and train/build tree
      int maxDepth = 3;
      int minSamples = 2;
      int minLeaf = 18;
      int numSplitCols = -1;  // all columns

      //int maxDepth = 3; //
      //int minSamples = 18;
      //int minLeaf = 1;
      //int numSplitCols = -1;  // all columns

      Console.WriteLine("\nSetting maxDepth = " +
        maxDepth);
      Console.WriteLine("Setting minSamples = " +
        minSamples);
      Console.WriteLine("Setting minLeaf = " +
        minLeaf);
      Console.WriteLine("Using default numSplitCols = -1 ");

      Console.WriteLine("\nCreating and training tree ");
      DecisionTreeRegressor dtr =
        new DecisionTreeRegressor(maxDepth, minSamples,
        minLeaf, numSplitCols, seed: 0);
      dtr.Train(trainX, trainY);
      Console.WriteLine("Done ");

      Console.WriteLine("\nTree: ");
      dtr.Display(showRows: false);
     
      // 3. evaluate model
      Console.WriteLine("\nEvaluating model ");
      double accTrain = dtr.Accuracy(trainX, trainY, 0.10);
      Console.WriteLine("Accuracy train (within 0.10) = " +
        accTrain.ToString("F4"));
      double accTest = dtr.Accuracy(testX, testY, 0.10);
      Console.WriteLine("Accuracy test (within 0.10) = " +
        accTest.ToString("F4"));

      double mseTrain = dtr.MSE(trainX, trainY);
      Console.WriteLine("\nMSE train = " +
        mseTrain.ToString("F4"));
      double mseTest = dtr.MSE(testX, testY);
      Console.WriteLine("MSE test = " +
        mseTest.ToString("F4"));

      // 4. use model
      Console.WriteLine("\nPredicting for trainX[0] = ");
      double[] x = trainX[0];
      VecShow(x, 4, 9);
      double predY = dtr.Predict(x);
      Console.WriteLine("Predicted y = " +
        predY.ToString("F4"));

      dtr.Explain(x);

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();

    } // Main()

    // ------------------------------------------------------
    // helpers for Main():
    //   MatLoad(), MatToVec(), VecShow().
    // ------------------------------------------------------

    static double[][] MatLoad(string fn, int[] usecols,
      char sep, string comment)
    {
      List"lt"double[]"gt" result =
        new List"lt"double[]"gt"();
      string line = "";
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment) == true)
          continue;
        string[] tokens = line.Split(sep);
        List"lt"double"gt" lst = new List"lt"double"gt"();
        for (int j = 0; j "lt" usecols.Length; ++j)
          lst.Add(double.Parse(tokens[usecols[j]]));
        double[] row = lst.ToArray();
        result.Add(row);
      }
      sr.Close(); ifs.Close();
      return result.ToArray();
    }

    static double[] MatToVec(double[][] mat)
    {
      int nRows = mat.Length;
      int nCols = mat[0].Length;
      double[] result = new double[nRows * nCols];
      int k = 0;
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          result[k++] = mat[i][j];
      return result;
    }

    static void VecShow(double[] vec, int dec, int wid)
    {
      for (int i = 0; i "lt" vec.Length; ++i)
        Console.Write(vec[i].ToString("F" + dec).
          PadLeft(wid));
      Console.WriteLine("");
    }

  } // class Program

  // ========================================================

  public class DecisionTreeRegressor
  {
    public int maxDepth;
    public int minSamples;  // aka min_samples_split
    public int minLeaf;  // min number of values in a leaf
    public int numSplitCols; // mostly for random forest
    public List"lt"Node"gt" tree = new List"lt"Node"gt"();
    public Random rnd;  // order in which cols are searched

    public double[][] trainX;  // store data by ref
    public double[] trainY;

    // ------------------------------------------------------

    public class Node
    {
      public int id;
      public int colIdx;  // aka featureIdx
      public double thresh;
      public int left;  // index into List
      public int right;
      public double value;
      public bool isLeaf;
      public List"lt"int"gt" rows;  // assoc rows in trainX

      public Node(int id, int colIdx, double thresh,
        int left, int right, double value, bool isLeaf,
        List"lt"int"gt" rows)
      {
        this.id = id;
        this.colIdx = colIdx;
        this.thresh = thresh;
        this.left = left;
        this.right = right;
        this.value = value;
        this.isLeaf = isLeaf;

        this.rows = rows;
      }
    } // class Node

    // --------------------------------------------

    public class StackInfo  // used to build tree
    {
      // tree node + associated rows, not explicit data
      public Node node; // node holds associated rows
      public int depth;

      public StackInfo(Node n, int d)
      {
        this.node = n;
        this.depth = d;
      }
    }

    // --------------------------------------------

    public DecisionTreeRegressor(int maxDepth = 2,
      int minSamples = 2, int minLeaf = 1,
      int numSplitCols = -1, int seed = 0)
    {
      // if maxDepth = 0, tree has just a root node
      // if maxDepth = 1, at most 3 nodes (root, l, r)
      // if maxDepth = n, at most 2^(n+1) - 1 nodes
      this.maxDepth = maxDepth;
      this.minSamples = minSamples;
      this.minLeaf = minLeaf;
      this.numSplitCols = numSplitCols;  // for ran. forest

      // create full tree List with dummy nodes
      int numNodes = (int)Math.Pow(2, (maxDepth + 1)) - 1;
      for (int i = 0; i "lt" numNodes; ++i)
      {
        //this.tree.Add(null); // OK, but let's avoid ptrs
        Node n = new Node(i, -1, 0.0, -1, -1, 0.0,
          false, new List"lt"int"gt"());
        this.tree.Add(n); // dummy node
      }
      this.rnd = new Random(seed);
    }

    // ------------------------------------------------------
    // public: Train(), Predict(), Explain(), Accuracy(),
    //   MSE(), Display().
    // helpers: MakeTree(), BestSplit(), TreeTargetMean(),
    //   TreeTargetVariance().
    // ------------------------------------------------------

    public void Train(double[][] trainX, double[] trainY)
    {
      this.trainX = trainX;
      this.trainY = trainY;
      this.MakeTree();
    }

    // ------------------------------------------------------

    public double Predict(double[] x)
    {
      int p = 0;
      Node currNode = this.tree[p];
      while (currNode.isLeaf == false && 
        p "lt" this.tree.Count)
      {
        if (x[currNode.colIdx] "lte" currNode.thresh)
          p = currNode.left;
        else
          p = currNode.right;
        currNode = this.tree[p];
      }
      return this.tree[p].value;

    }

    // ------------------------------------------------------

    public void Explain(double[] x)
    {
      int p = 0;
      Console.WriteLine("\nIF ");
      Node currNode = this.tree[p];
      while (currNode.isLeaf == false)
      {
        Console.Write("column ");
        Console.Write(currNode.colIdx + " ");
        if (x[currNode.colIdx] "lte" currNode.thresh)
        {
          Console.Write(" "lte" ");
          Console.WriteLine(currNode.thresh.
            ToString("F4").PadLeft(8) + " AND ");
          p = currNode.left;
          currNode = this.tree[p];
        }
        else
        {
          Console.Write(" "gt"  ");
          Console.WriteLine(currNode.thresh.
            ToString("F4").PadLeft(8) + " AND ");
          p = currNode.right;
          currNode = this.tree[p];
        }
      }
      Console.WriteLine("THEN \npredicted = " +
        currNode.value.ToString("F4"));
    }

    // ------------------------------------------------------

    public double Accuracy(double[][] dataX, double[] dataY,
      double pctClose)
    {
      int numCorrect = 0; int numWrong = 0;
      for (int i = 0; i "lt" dataX.Length; ++i)
      {
        double actualY = dataY[i];
        double predY = this.Predict(dataX[i]);
        if (Math.Abs(predY - actualY) "lt"
          Math.Abs(pctClose * actualY))
          ++numCorrect;
        else
          ++numWrong;
      }
      return (numCorrect * 1.0) / (numWrong + numCorrect);
    }

    // ------------------------------------------------------

    public double MSE(double[][] dataX,
      double[] dataY)
    {
      // standard machine learning MSE
      int n = dataX.Length;
      double sum = 0.0;
      for (int i = 0; i "lt" n; ++i)
      {
        double actualY = dataY[i];
        double predY = this.Predict(dataX[i]);
        sum += (actualY - predY) * (actualY - predY);
      }
      return sum / n;
    }

    // ------------------------------------------------------

    public void Display(bool showRows)
    {
      for (int i = 0; i "lt" this.tree.Count; ++i)
      {
        Node n = this.tree[i];
        // check for empty nodes
        if (n == null) continue;
        if (n.colIdx == -1 && n.isLeaf == false) continue;

        string s1 = "ID " +
          n.id.ToString().PadRight(3) + " | ";
        string s2 = "idx " +
          n.colIdx.ToString().PadLeft(3) + " | ";
        string s3 = "v " +
          n.thresh.ToString("F4").PadLeft(8) + " | ";
        string s4 = "L " +
          n.left.ToString().PadLeft(3) + " | ";
        string s5 = "R " +
          n.right.ToString().PadLeft(3) + " | ";
        string s6 = "v " +
          n.value.ToString("F4").PadLeft(8) + " | ";
        string s7 = "leaf " + (n.isLeaf == true ? "T" : "F");
          //n.isLeaf.ToString().PadLeft(6);

        string s8;
        if (showRows == true)
        {
          s8 = "\nRows:";
          for (int j = 0; j "lt" n.rows.Count; ++j)
          {
            if (j "gt" 0 && j % 20 == 0)
              s8 += "\n";
            s8 += n.rows[j].ToString().PadLeft(4) + " ";
          }
        }
        else
        {
          s8 = "";
        }
        Console.WriteLine(s1 + s2 + s3 + s4 +
          s5 + s6 + s7 + s8);
        if (showRows == true) Console.WriteLine("\n");
      }
    } // Display()

    // ------------------------------------------------------

    // helpers: MakeTree(), BestSplit(),
    // TreeTargetMean(), TreeTargetVariance()

    private void MakeTree()
    {
      // no recursion, no pointers, List storage
      if (this.numSplitCols == -1) // use all cols
        this.numSplitCols = this.trainX[0].Length;

      List"lt"int"gt" allRows = new List"lt"int"gt"();
      for (int i = 0; i "lt" this.trainX.Length; ++i)
        allRows.Add(i);
      double grandMean = this.TreeTargetMean(allRows);

      Node root = new Node(0, -1, 0.0, 1, 2,
        grandMean, false, allRows);

      Stack"lt"StackInfo"gt" stack = 
        new Stack"lt"StackInfo"gt"();
      stack.Push(new StackInfo(root, 0));
      while (stack.Count "gt" 0)
      {
        StackInfo info = stack.Pop();
        Node currNode = info.node;
        this.tree[currNode.id] = currNode;

        List"lt"int"gt" currRows = info.node.rows;
        int currDepth = info.depth;

        if (currDepth == this.maxDepth ||
          currRows.Count "lt" this.minSamples)
        {
          currNode.value = this.TreeTargetMean(currRows);
          currNode.isLeaf = true;
          continue;
        }

        double[] splitInfo = this.BestSplit(currRows);
        int colIdx = (int)splitInfo[0];
        double thresh = splitInfo[1];

        if (colIdx == -1)  // unable to split
        {
          currNode.value = this.TreeTargetMean(currRows);
          currNode.isLeaf = true;
          continue;
        }

        // got good split so at internal, non-leaf node
        currNode.colIdx = colIdx;
        currNode.thresh = thresh;

        // make the data splits for children
        // find left rows and right rows
        List"lt"int"gt" leftIdxs = new List"lt"int"gt"();
        List"lt"int"gt" rightIdxs = new List"lt"int"gt"();

        for (int k = 0; k "lt" currRows.Count; ++k)
        {
          int r = currRows[k];
          if (this.trainX[r][colIdx] "lte" thresh)
            leftIdxs.Add(r);
          else
            rightIdxs.Add(r);
        }

        int leftID = currNode.id * 2 + 1;
        Node currNodeLeft = new Node(leftID, -1, 0.0,
          2 * leftID + 1, 2 * leftID + 2,
          this.TreeTargetMean(leftIdxs), false, leftIdxs);
        stack.Push(new StackInfo(currNodeLeft,
          currDepth + 1));

        int rightID = currNode.id * 2 + 2;
        Node currNodeRight = new Node(rightID, -1, 0.0,
          2 * rightID + 1, 2 * rightID + 2,
          this.TreeTargetMean(rightIdxs), false, rightIdxs);
        stack.Push(new StackInfo(currNodeRight,
          currDepth + 1));

      } // while

      return;

    } // MakeTree()

    // ------------------------------------------------------

    private double[] BestSplit(List"lt"int"gt" rows)
    {
      // implicit params numSplitCols, minLeaf, numsplitCols
      // result[0] = best col idx (as double)
      // result[1] = best split value
      rows.Sort();

      int bestColIdx = -1;  // indicates bad split
      double bestThresh = 0.0;
      double bestVar = double.MaxValue;  // smaller better

      int nRows = rows.Count;  // or dataY.Length
      int nCols = this.trainX[0].Length;

      if (nRows == 0)
      {
        throw new Exception("empty data in BestSplit()");
      }

      // process cols in scrambled order
      int[] colIndices = new int[nCols];
      for (int k = 0; k "lt" nCols; ++k)
        colIndices[k] = k;
      // shuffle, inline Fisher-Yates
      int n = colIndices.Length;
      for (int i = 0; i "lt" n; ++i)
      {
        int ri = rnd.Next(i, n);  // be careful
        int tmp = colIndices[i];
        colIndices[i] = colIndices[ri];
        colIndices[ri] = tmp;
      }

      // numSplitCols is usually all columns (-1)
      for (int j = 0; j "lt" this.numSplitCols; ++j)
      {
        int colIdx = colIndices[j];
        HashSet"lt"double"gt" examineds = 
          new HashSet"lt"double"gt"();

        for (int i = 0; i "lt" nRows; ++i) // each row
        {
          // if curr thresh been seen, skip it
          double thresh = this.trainX[rows[i]][colIdx];
          if (examineds.Contains(thresh)) continue;
          examineds.Add(thresh);

          // get row idxs where x is lte, gt thresh
          List"lt"int"gt" leftIdxs = new List"lt"int"gt"();
          List"lt"int"gt" rightIdxs = new List"lt"int"gt"();
          for (int k = 0; k "lt" nRows; ++k)
          {
            if (this.trainX[rows[k]][colIdx] "lte" thresh)
              leftIdxs.Add(rows[k]);
            else
              rightIdxs.Add(rows[k]);
          }

          // Check if proposed split has too few values
          if (leftIdxs.Count "lt" this.minLeaf ||
            rightIdxs.Count "lt" this.minLeaf)
            continue;  // to next row

          double leftVar = 
            this.TreeTargetVariance(leftIdxs);
          double rightVar = 
            this.TreeTargetVariance(rightIdxs);
          double weightedVar = (leftIdxs.Count * leftVar +
            rightIdxs.Count * rightVar) / nRows;

          if (weightedVar "lt" bestVar)
          {
            bestColIdx = colIdx;
            bestThresh = thresh;
            bestVar = weightedVar;
          }

        } // each row
      } // j each col

      double[] result = new double[2];  // out params ugly
      result[0] = 1.0 * bestColIdx;
      result[1] = bestThresh;
      return result;

    } // BestSplit()

    // ------------------------------------------------------

    private double TreeTargetMean(List"lt"int"gt" rows)
    {
      // mean of rows items in trainY: for node prediction
      double sum = 0.0;
      for (int i = 0; i "lt" rows.Count; ++i)
      {
        int r = rows[i];
        sum += this.trainY[r];
      }
      return sum / rows.Count;
    }

    // ------------------------------------------------------

    private double TreeTargetVariance(List"lt"int"gt" rows)
    {
      double mean = this.TreeTargetMean(rows);
      double sum = 0.0;
      for (int i = 0; i "lt" rows.Count; ++i)
      {
        int r = rows[i];
        sum += (this.trainY[r] - mean) *
          (this.trainY[r] - mean);
      }
      return sum / rows.Count;
    }

    // ------------------------------------------------------

  } // class DecisionTreeRegressor

} // ns

Training data:

# synthetic_train_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402

Test data:

# synthetic_test_40.txt
#
 0.7462,  0.4006, -0.0590,  0.6543, -0.0083,  0.1935
 0.8495, -0.2260, -0.0142, -0.4911,  0.7699,  0.1078
-0.2335, -0.4049,  0.4352, -0.6183, -0.7636,  0.5088
 0.1810, -0.5142,  0.2465,  0.2767, -0.3449,  0.3136
-0.8650,  0.7611, -0.0801,  0.5277, -0.4922,  0.7140
-0.2358, -0.7466, -0.5115, -0.8413, -0.3943,  0.4533
 0.4834,  0.2300,  0.3448, -0.9832,  0.3568,  0.1360
-0.6502, -0.6300,  0.6885,  0.9652,  0.8275,  0.3046
-0.3053,  0.5604,  0.0929,  0.6329, -0.0325,  0.4756
-0.7995,  0.0740, -0.2680,  0.2086,  0.9176,  0.4565
-0.2144, -0.2141,  0.5813,  0.2902, -0.2122,  0.4119
-0.7278, -0.0987, -0.3312, -0.5641,  0.8515,  0.4438
 0.3793,  0.1976,  0.4933,  0.0839,  0.4011,  0.1905
-0.8568,  0.9573, -0.5272,  0.3212, -0.8207,  0.7415
-0.5785,  0.0056, -0.7901, -0.2223,  0.0760,  0.5551
 0.0735, -0.2188,  0.3925,  0.3570,  0.3746,  0.2191
 0.1230, -0.2838,  0.2262,  0.8715,  0.1938,  0.2878
 0.4792, -0.9248,  0.5295,  0.0366, -0.9894,  0.3149
-0.4456,  0.0697,  0.5359, -0.8938,  0.0981,  0.3879
 0.8629, -0.8505, -0.4464,  0.8385,  0.5300,  0.1769
 0.1995,  0.6659,  0.7921,  0.9454,  0.9970,  0.2330
-0.0249, -0.3066, -0.2927, -0.4923,  0.8220,  0.2437
 0.4513, -0.9481, -0.0770, -0.4374, -0.9421,  0.2879
-0.3405,  0.5931, -0.3507, -0.3842,  0.8562,  0.3987
 0.9538,  0.0471,  0.9039,  0.7760,  0.0361,  0.1706
-0.0887,  0.2104,  0.9808,  0.5478, -0.3314,  0.4128
-0.8220, -0.6302,  0.0537, -0.1658,  0.6013,  0.4306
-0.4123, -0.2880,  0.9074, -0.0461, -0.4435,  0.5144
 0.0060,  0.2867, -0.7775,  0.5161,  0.7039,  0.3599
-0.7968, -0.5484,  0.9426, -0.4308,  0.8148,  0.2979
 0.7811,  0.8450, -0.6877,  0.7594,  0.2640,  0.2362
-0.6802, -0.1113, -0.8325, -0.6694, -0.6056,  0.6544
 0.3821,  0.1476,  0.7466, -0.5107,  0.2592,  0.1648
 0.7265,  0.9683, -0.9803, -0.4943, -0.5523,  0.2454
-0.9049, -0.9797, -0.0196, -0.9090, -0.4433,  0.6447
-0.4607,  0.1811, -0.2389,  0.4050, -0.0078,  0.5229
 0.2664, -0.2932, -0.4259, -0.7336,  0.8742,  0.1834
-0.4507,  0.1029, -0.6294, -0.1158, -0.6294,  0.6081
 0.8948, -0.0124,  0.9278,  0.2899, -0.0314,  0.1534
-0.1323, -0.8813, -0.0146, -0.0697,  0.6135,  0.2386
Posted in Machine Learning | Leave a comment

I Ask ChatGPT to Generate Gradient Boosting Regression C# Code from Scratch

Bottom line: I asked ChatGPT to generate code for a gradient boosting regression model, from scratch, using the C# language, without any external machine learning library code, such as ML.NET. I was impressed: the resulting C# code only needed a few tweaks to run, and was essentially correct.

Gradient boosting regression (GBR) is a very complicated technique to predict a single numeric value. The essence of GBR is a collection of relatively small decision trees. The trees are constructed sequentially by predicting the errors (aka residual, aka pseudo-gradients) that were generated by the previous tree in the collection. The idea is very subtle.



Left: The AI-generated code for a C# from-scratch implementation of GBR worked after a few tweaks, but it’s minimal. Right: One of my old from-scratch implementations has a lot more useful information.


I have implemented GBR, from scratch, using C#, and trust me, the code is very complicated. I asked ChatGPT to generate from-scratch C# GBR code for a dataset. The resulting code wasn’t perfect, but it was far better than I expected.

The most significant downside to the AI-generated code was that it used a decision stump — a minimal decision tree that has depth = 1: only a root node, a single left child, and a single right child. A more sophisticated version of GBR would use decision trees with a specified depth such as 2 or 3.

A secondary downside to the AI-generated code is that the model parameters — number of stumps = 150, learning rate = 0.10 — did not give a very good model in terms of model accuracy.

I made a few minor changes to the AI-generated code:

I added a few WriteLine statements for clarity.
I had to remove comments from the training data file.
I added an Accuracy() method.

The AI-generated C# for GBR is a good starting point, but that base starting code needs quite a bit of additional code to be useful in practice. And adding that additional code requires some fairly deep knowledge of how GBR works.

A very interesting experiment.



The evolution of AI-generated code is advancing with incredible speed. I grew up during the 1950s and 1960s and was fascinated by the blazingly fast evolution of jet aircraft. I built scale models of all of these six aircraft.

During the Cold War of the 1950s and 1960s, interceptor jets were designed to stop incoming Soviet bomber planes. The interceptor role went away in the mid-1970s as ICBM missiles became the primary threat.

Top Left: The North American F-86D Sabre Dog was derived from the F-86A fighter jet. Over 2800 were built. They flew from 1952 to 1965. Top speed about 700 mph.

Top Right: The Northrop F-89D Scorpion had 104 rockets in wingtip pods. Over 1000 were built. It flew from 1951 to 1968. Top speed 640 mph.

Middle Left: The Lockheed F-94C Starfire flew from 1951 to 1958. Approximately 850 were built. Top speed 640 mph.

Middle Right: The two-man McDonnell F-101B Voodoo was derived from the one-man F-101A fighter version. About 800 were built. It flew from 1958 to 1972. Top speed 1500 mph.

Bottom Left: The Convair F-102A Delta Dagger flew from 1957 through the early 1970s. About 1000 were built. It was the first operational supersonic interceptor (825 mph).

Bottom Right: The Convair F-106A Delta Dart was derived from the F-102, but was quite a bit different and had a top speed of 1500 mph. It flew from 1960 through the mid 1970s when the interceptor role was phased out. About 340 were built.


The AI-generated code with a few modification. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols (my blog editor chokes on symbols).

using System;
using System.Collections.Generic;
using System.Linq;

namespace OpenAIGradientBoost
{
  internal class Program
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin gradient boost regression" +
        " scratch C# from ChatGPT "); // added

      // Load your dataset
      Console.WriteLine("\nLoading train data "); // added
      var lines = 
        System.IO.File.ReadAllLines("..\\..\\..\\Data\\" +
        "synthetic_train_200.txt"); // had to remove '#'s
      double[][] X = new double[lines.Length][];
      double[] y = new double[lines.Length];

      for (int i = 0; i "lt" lines.Length; i++)
      {
        var parts = lines[i].Split(',',
          StringSplitOptions.RemoveEmptyEntries);
        X[i] = parts.Take(5).Select(double.Parse).ToArray();
        y[i] = double.Parse(parts[5]);
      }

      // Train Gradient Boosting Regressor
      Console.WriteLine("\nCreating and " +
        "training GBR model "); // added
      var gbr = new GradientBoostingRegressor(
          nEstimators: 150,
          learningRate: 0.1
      );
      gbr.Fit(X, y);

      double accTrain = gbr.Accuracy(X, y, 0.10);
      Console.WriteLine("\nTrain accuracy (within 0.10) = " +
        accTrain.ToString("F4")); // added

      // Predict example
      //double[] sample = { 0.0, 0.1, -0.2, 0.3, 0.4 };
      double[] sample = X[0]; // added
      double prediction = gbr.Predict(sample);
      Console.WriteLine("\nPredicting for X[0] "); // added
      Console.WriteLine("\nPrediction: " +
        prediction.ToString("F4"));

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine(); // added

    } // Main()


  } // Program

  public class DecisionStump
  {
    public int FeatureIndex;
    public double Threshold;
    public double LeftValue;
    public double RightValue;

    // Predict single sample
    public double Predict(double[] x)
    {
      return x[FeatureIndex] "lte" 
        Threshold ? LeftValue : RightValue;
    }
  }

  public class GradientBoostingRegressor
  {
    public int NEstimators = 100;
    public double LearningRate = 0.1;

    private List"lt"DecisionStump"gt" _stumps;
    private double _initialPrediction;

    public double Accuracy(double[][] dataX, double[] dataY,
      double pctClose) // added
    {
      int numCorrect = 0; int numWrong = 0;
      for (int i = 0; i "lt" dataX.Length; ++i)
      {
        double actualY = dataY[i];
        double predY = this.Predict(dataX[i]);
        if (Math.Abs(predY - actualY) "lt"
          (pctClose * actualY))
          ++numCorrect;
        else
          ++numWrong;
      }
      return (numCorrect * 1.0) / (numWrong + numCorrect);
    }

    public GradientBoostingRegressor(int nEstimators = 100,
      double learningRate = 0.1)
    {
      NEstimators = nEstimators;
      LearningRate = learningRate;
      _stumps = new List"lt"DecisionStump"gt"();
    }

    //
    // Fit model on training data
    //
    public void Fit(double[][] X, double[] y)
    {
      int nSamples = y.Length;

      // Initial prediction = mean of targets
      _initialPrediction = y.Average();

      // Current model predictions
      double[] pred = 
        Enumerable.Repeat(_initialPrediction,
        nSamples).ToArray();

      // Boosting loop
      for (int m = 0; m "lt" NEstimators; m++)
      {
        // Compute pseudo-residuals = gradient of squared loss
        double[] residuals = new double[nSamples];
        for (int i = 0; i "lt" nSamples; i++)
          residuals[i] = y[i] - pred[i];

        // Fit a stump to residuals
        DecisionStump stump = FitStump(X, residuals);

        // Save stump
        _stumps.Add(stump);

        // Update predictions
        for (int i = 0; i "lt" nSamples; i++)
          pred[i] += LearningRate * stump.Predict(X[i]);
      }
    }

    //
    // Make predictions
    //
    public double Predict(double[] x)
    {
      double result = _initialPrediction;
      foreach (var stump in _stumps)
        result += LearningRate * stump.Predict(x);
      return result;
    }

    //
    // Fit a decision stump to residuals
    //
    private DecisionStump FitStump(double[][] X,
      double[] residuals)
    {
      int nSamples = X.Length;
      int nFeatures = X[0].Length;

      double bestError = double.MaxValue;
      DecisionStump bestStump = null;

      // Try each feature and several threshold candidates
      for (int feature = 0; feature "lt" nFeatures; feature++)
      {
        // Sort by feature
        var sorted = X
            .Select((row, idx) ="gt" 
            new { Value = row[feature],
              Resid = residuals[idx] })
            .OrderBy(e ="gt" e.Value)
            .ToArray();

        // Try all possible thresholds (midpoints)
        for (int i = 1; i "lt" nSamples; i++)
        {
          double thr = 
            (sorted[i - 1].Value + sorted[i].Value) / 2.0;

          // Left = "lte" threshold, Right = "gt" threshold
          var leftRes = 
            sorted.Where(e ="gt" e.Value "lte" thr).
            Select(e ="gt" e.Resid).ToArray();
          var rightRes = 
            sorted.Where(e ="gt" e.Value "gt" thr).
            Select(e ="gt" e.Resid).ToArray();

          if (leftRes.Length == 0 || rightRes.Length == 0)
            continue;

          double leftMean = leftRes.Average();
          double rightMean = rightRes.Average();

          // Compute squared error
          double error =
              leftRes.Sum(r ="gt" Math.Pow(r - leftMean, 2)) +
              rightRes.Sum(r ="gt" Math.Pow(r - rightMean, 2));

          if (error "lt" bestError)
          {
            bestError = error;
            bestStump = new DecisionStump
            {
              FeatureIndex = feature,
              Threshold = thr,
              LeftValue = leftMean,
              RightValue = rightMean
            };
          }
        }
      }

      return bestStump;
    }
  }
} // ns

Training data:

-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402
Posted in Machine Learning | Leave a comment

QR Decomposition From Scratch Using Python and NumPy

I was going through my old list of to-do items and noticed that I made a note to implement QR decomposition, from scratch, using Python. If you have a matrix A and apply QR decomposition, you get matrices Q and R where Q * R = A.

There are three main QR decomposition algorithms: Householder, modified Gram-Schmidt, Givens. Comparisons are difficult but in general, Householder has the highest precision, while GS and Givens have lower precision but are dramatically simpler to implement. I decided to use Householder.

One of the many tricky details about QR decomposition is that there are two versions. One version returns “complete” Q and R matrices. The second version returns “reduced” Q and R. There are many algorithms for QR decomposition; two common algorithms are Householder and Gram-Schmidt.

After a bit of searching the Internet for resources, I got a nice demo up and running of QR decomposition using the Householder algorithm:

C:\VSM\MatrixPseudoInverseQR: python qr_demo.py

Begin QR decomposition using NumPy

Source A:
[[4.0000 7.0000 1.0000]
 [6.0000 0.0000 3.0000]
 [8.0000 1.0000 9.0000]
 [2.0000 5.0000 6.0000]
 [1.0000 5.0000 4.0000]]

===================

from-scratch Q (complete):
[[-0.3636  0.5998  0.6426 -0.1606 -0.2634]
 [-0.5455 -0.2854  0.2952  0.5362  0.4964]
 [-0.7273 -0.2677 -0.3760 -0.4394 -0.2549]
 [-0.1818  0.4692 -0.5091  0.6279 -0.3055]
 [-0.0909  0.5167 -0.3153 -0.3152  0.7252]]

from-scratch R (complete):
[[-11.0000  -4.6364 -10.0000]
 [ -0.0000   8.8603   2.2162]
 [ -0.0000   0.0000  -6.1716]
 [ -0.0000  -0.0000   0.0000]
 [ -0.0000  -0.0000  -0.0000]]

-------------------

from-scratch Q (reduced):
[[-0.3636  0.5998  0.6426]
 [-0.5455 -0.2854  0.2952]
 [-0.7273 -0.2677 -0.3760]
 [-0.1818  0.4692 -0.5091]
 [-0.0909  0.5167 -0.3153]]

from-scratch R (reduced):
[[-11.0000  -4.6364 -10.0000]
 [ -0.0000   8.8603   2.2162]
 [ -0.0000   0.0000  -6.1716]]

===================

. . . using np.linalg.qr() gives same results

End QR demo

I validated my from-scratch QR decomposition function by comparing with the NumPy np.linalg.qr() function to make sure I got the same results.

My motivation for all of this is to eventually implement Moore-Penrose pseudo-inverse using QR (as opposed to my usual technique of using SVD). But that’s an exploration for another day. Good fun.



I enjoy implementing functions from scratch because it gives me full visibility into the inner workings of the system.

I’ve always been fascinated by cutaway illustrations. Here’s a beautiful 1937 image of a Pan Am Clipper plane by artist Kenneth W. Thompson. I love art that provides information.

Click to enlarge.


Demo program.

# qr_demo.py
# see rosettacode.org/wiki/QR_decomposition

import numpy as np

np.set_printoptions(precision=4, suppress=True,
  floatmode='fixed')

print("\nBegin QR decomposition from scratch Python demo ");

# -----------------------------------------------------------

def my_qr(A, mode):
  # mode = 'complete' or 'reduced'
  # this version defined only for n_rows gte n_cols
  m, n = A.shape
  k = np.minimum(m,n)  # always k == n
  R = A.copy()
  Q = np.eye(m)

  if m == n: end = n-1
  else: end = n
  for i in range(end):
    H = np.eye(m)

    # Householder routine follows
    a = R[i:, i]
    norm_a = np.copysign(np.linalg.norm(a), a[0])
    v = a / (a[0] + norm_a)
    v[0] = 1.0
    h = np.eye(len(a))
    V = np.matmul(v[:, None], v[None, :])
    h -= (2.0 / np.dot(v, v)) * V

    H[i:, i:] = h  # copy h into lower right
    Q = np.matmul(Q, H)
    R = np.matmul(H, R)

  # 'reduced' : returns Q, R with dims (m, k), (k, n)
  if mode == 'complete':
    return Q, R
  elif mode == 'reduced':
    return Q[0:m,0:k], R[0:k,0:n]

  return Q, R

# -----------------------------------------------------------

# 5x3
A = np.array([
[4, 7, 1],
[6, 0, 3],
[8, 1, 9],
[2, 5, 6],
[1, 5, 4]], dtype=np.float64)

print("\nSource A: ")
print(A)

print("\n=================== ")

Q,R = my_qr(A, 'complete')
print("\nfrom-scratch Q (complete): ")
print(Q)

print("\nfrom-scratch R (complete): ")
print(R)

print("\n------------------- ")

Q,R = my_qr(A, 'reduced')
print("\nfrom-scratch Q (reduced): ")
print(Q)

print("\nfrom-scratch R (reduced): ")
print(R)

print("\n=================== ")

Q, R = np.linalg.qr(A, mode='complete')
print("\nnumpy Q (complete): ")
print(Q)

print("\nnumpy R (complete): ")
print(R)

print("\n------------------- ")

Q, R = np.linalg.qr(A, mode='reduced')
print("\nnumpy Q (reduced): ")
print(Q)

print("\nnumpy R (reduced): ")
print(R)

print("\n=================== ")

print("\nEnd QR demo ")
Posted in Machine Learning | Leave a comment

NFL 2025 Week 22 (Conference Championships) Predictions – Zoltar Likes the Seahawks and the Broncos

Zoltar is my NFL football prediction computer program. It uses a neural network and a type of reinforcement learning. Here are Zoltar’s predictions for week #21 (conference championship games) of the 2025 season.

Zoltar:     broncos  by    6  opp =    patriots    | Vegas:    patriots  by  5.5
Zoltar:    seahawks  by    6  opp =        rams    | Vegas:    seahawks  by  2.5

These predictions do not take into account the season-ending injury to the Broncos quarterback Bo Nix.

Zoltar theoretically suggests betting when the Vegas line is “significantly” different from Zoltar’s prediction. I use a conservative threshold of 4 points difference in the beginning and end of the season, and a more aggressive threshold of 3 points in the middle of the season. For the final two weeks of the season, I use 3 points as the threshold, and so:

patriots     at      broncos: Bet on Vegas underdog broncos
rams         at     seahawks: Bet on Vegas favorite seahawks

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better in order to take into account logistics and thiings like manual data entry errors.

In week #20, against the Vegas point spread, Zoltar went 1-0 using 3.0 points as the advice threshold. Zoltar liked the underdog Bears (+4 points) against the favorite Rams (-4) and the Bears lost by only 3 points, 34-31.

For the season, against the spread, Zoltar is 48-30 (~61% accuracy).

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting. In week #20, just predicting the winning team, Zoltar went 2-0, with 2 games too close to call. Vegas was a perfect 4-0 at just predicting the winning team.



My prediction system is named after the Zoltar fortune teller machine you can find in arcades. I am not embarrassed to admit that I like old animated cartoons, at least those that are aimed at an adult audience as well as children.

Left: This is a scene from “Snoopy Come Home” (1972). Charlie Brown and Peppermint Patty consult a fortune teller.

Center: A scene from an “Ounce of Pink” (1965). The Pink Panther comes across an annoying talking fortune teller machine that convinces Panther to take him home.

Right: A scene from “The Weather Lady” (1963) episode of the Rocky and Bullwinkle show. The citizens of Frostbite Falls buy a fortune teller to predict the weather. The machine is stolen by Boris Badenov and Natasha Fatale.


Posted in Zoltar | Leave a comment