Approximations to the LogGamma Function Using Python

I was sitting at home one Saturday morning, waiting for the rain to stop so I could walk my dogs. For reasons that I don’t remember now, I decided to look at some old ANOVA (analysis of variance test) code.

In ANOVA, you must compute the area under the curve of the F-distribution. To do that, you must compute an incomplete regularized beta function. To do that, you must compute a LogBeta function. To do that, you must compute a LogGamma function. Whew!

The LogGamma function cannot be computed exactly, it must be estimated. There are many algorithms to estimate the LogGamma function. I implemented several approximations using Python. Here’s the output of a demo run:

Approximations to the LogGamma(z) function

Setting z = 0.5

Using scipy loggamma()       : 0.572364942925
Using my old Lanczos(5,7 )   : 0.572364942923
Using Wikipedia Lanczos(5,7) : 0.572364942923
Using Nemes approx           : 0.563828887759
Using A&S Stirling(6)        : 0.512589326855

End demo

The most popular algorithm is called the Lanczos approximation. There are different versions of Lanczos. I used the g=5, n=7 version. While I was reviewing my code, I looked up the Wikipedia article on the Lanczos approximation, and noticed that the version in the article used a slightly different version from my old implementation that I’ve been using for many years.

The output shows that the Nemes approximation is significantly less accurate than Lanczos, and the Stirling approximation with 6 terms is quite poor (especially for small z values).

Good fun.



The LogBeta function is old but still interesting (to me anyway). Here is a fascinating old automata made by Henry Phalibois in approximately 1920. The magician points to the door of the closet and it opens to reveal his assistant. He waves and the door closes. He hits the gong and the door opens to reveal the assistant has disappeared. He gongs again and the door to the small red cube opens to reveal the assistant sitting cross-legged.


Demo program. Replace “lt” (less than) with the Boolean operator symbol.

# log_gamma_approximations.py

import numpy as np
import scipy as sp
import math

# -----------------------------------------------------------

def logGammaWikiLanczos(z):
  # Lanczos approximation g=5, n=7, Wikipedia
  coefs = [ 1.0000000001900148240,
    76.180091729471463483, -86.505320329416767652,
    24.014098240830910490, -1.2317395724501553875,
    0.0012086509738661785061, -5.3952393849531283785e-6 ]

  if z "lt" 0.5:
    y = np.pi / (np.sin(np.pi * z) * logGammaWikiLanczos(1-z))
  else:
    z -= 1.0
    x = coefs[0]
    for i in range(1, len(coefs)):
      x += coefs[i] / (z + i)
      t = z + 5 + 0.5
      y = np.sqrt(2 * np.pi) * (t**(z + 0.5)) * np.exp(-t) * x
  return np.log(y)

# -----------------------------------------------------------

def logGammaMyOldLanczos(z):
  # Lanczos approximation g=5, n=7, my old version
  coefs = [ 1.000000000190015,
    76.18009172947146, -86.50532032941677,
    24.01409824083091, -1.231739572450155,
    0.1208650973866179e-2, -0.5395239384953e-5 ]

  logSqrtTwoPi = 0.91893853320467274178
  if z "lt" 0.5:
    return np.log(np.pi / np.sin(np.pi * z)) * \
      logGammaMyLanczos(1.0 - z);
  else:
    zz = z - 1.0
    b = zz + 5.5
    sum = coefs[0]

    for i in range(1,len(coefs)):
      sum += coefs[i] / (zz + i)

    return (logSqrtTwoPi + np.log(sum) - b) + \
      (np.log(b) * (zz + 0.5))

# -----------------------------------------------------------

def logGammaNemes(z):
  a = 0.5 * (np.log(2 * np.pi) - np.log(z))
  b = (12 * z) - (1.0 / 10 * z)
  c = np.log(z + (1 / b))
  d = z * (c - 1)
  return a + d

# -----------------------------------------------------------

def logGammaStirling(z):
  # A&S 6.1.41 (Stirling's approximation)
  x1 = (z - 0.5) * np.log(z)
  x3 = 0.5 * np.log(2 * np.pi)
  x4 = 1 / (12 * z)
  x5 = 1 / (360 * z * z * z)
  x6 = 1 / (1260 * z * z * z * z * z)
  x7 = 1 / (1680 * z * z * z * z * z * z * z)
  # more terms possible
  return x1 - z + x3 + x4 - x5 + x6 - x7

# -----------------------------------------------------------

print("\nApproximations to the LogGamma(z) function ")

z = 0.5
print("\nSetting z = %0.1f \n" % z)

result = sp.special.loggamma(z)
print("Using scipy loggamma()       : %0.12f " % result)

result = logGammaMyOldLanczos(z)
print("Using my old Lanczos(5,7 )   : %0.12f " % result)

result = logGammaWikiLanczos(z)
print("Using Wikipedia Lanczos(5,7) : %0.12f " % result)

result = logGammaNemes(z)
print("Using Nemes approx           : %0.12f " % result)

result = logGammaStirling(z)
print("Using A&S Stirling(6)        : %0.12f " % result)

print("\nEnd demo ")
Posted in Machine Learning | Leave a comment

“What Is Model Context Protocol (MCP) and Why Should You Care?” on the Pure AI Web Site

I contributed technical content and and opinions to an article titled “What Is Model Context Protocol (MCP) and Why Should You Care?” in the January 2026 edition of the Pure AI web site. See https://pureai.com/articles/2026/01/07/what-is-model-context-protocol-mcp-and-why-should-you-care.aspx.

Model Context Protocol (MCP) is an open-source standard that defines a set of rules for how AI assistants and agents can connect to data sources and tools. The idea is illustrated in the diagram below:

In the diagram, AI Application 1 could be something like a ChatGPT (from Open AI) system. The application needs data from a SQL database, a document store, and some sort of custom processing tool. AI application 2 could be something like a Llama agent (from Meta) that needs to access some documents and a database of images.

The two applications could use custom code to access the needed resources. Instead, the applications access their resources through a set of standardized rules defined by the Model Context Protocol.

MCP defines how AI applications can access resources, but MCP is not a protocol for how different AI applications can communicate with each other (indicated by the dashed red arrow in the diagram). Designing an application-to-application protocol is a seperate ongoing effort in AI research.

I provided some comments.

McCaffrey observed, “One business possibility is a company that creates a platform that offers a curated library of pre-built, enterprise-grade MCP servers for popular business tools (Salesforce, SAP, Smartsheet, etc.) This could appeal to companies that can’t afford deep in-house AI technical expertise.”

McCaffrey speculated, “It’s possible that MCP will evolve into a marketplace ecosystem where pre-built, certified MCP servers for popular enterprise tools become commoditized plug-and-play solutions.”



Many years ago, when my son was just a few years old, he had an interesting set of tracks in the shape of letters of the alphabet. The tracks were cleverly designed with a connection protocol so they could be hooked together in all kinds of interesting ways. My son and I spent many hours playing with this toy.


Posted in Machine Learning | Leave a comment

NFL 2025 Week 20 (Division Championships) Predictions – Zoltar Agrees with Las Vegas but if Forced Would Pick Underdog Bears Against Rams

Zoltar is my NFL football prediction computer program. It uses a neural network and a type of reinforcement learning. Here are Zoltar’s predictions for week #20 (division championship games) of the 2025 season.

Zoltar:     broncos  by    4  opp =       bills    | Vegas:     broncos  by    1
Zoltar:    seahawks  by    6  opp = fortyniners    | Vegas:    seahawks  by  7.5
Zoltar:      texans  by    0  opp =    patriots    | Vegas:    patriots  by  2.5
Zoltar:        rams  by    0  opp =       bears    | Vegas:        rams  by    4

Zoltar theoretically suggests betting when the Vegas line is “significantly” different from Zoltar’s prediction. For this season I’ve been using a conservative threshold of 4 points difference in the early and late parts of the season, and a more aggressive threshold of 3 points difference in the middle of the season.

For week #20 Zoltar agrees closely with the Las Vegas point spread, meaning all of Zoltar’s predictions are 4 points or less than the Vegas point spread. But if forced to make a recommendation, Zoltar would suggest a bet on the Vegas underdog Chicago Bears against the Los Angeles Rams.

Zoltar thinks the game is dead even, but Vegas favors the Rams by 4 points. A bet on the underdog Bears will pay if the Bears win by any score, or if the Rams win but by less than 4.0 points (i.e., 3 points or less). If the Rams win by exactly 4 points, all bets are a push.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better to account for logisitics and things like data entry errors.

In week #19, against the Vegas point spread, Zoltar went 1-0 using 4.0 points as the advice threshold. Zoltar liked underdog Panthers against the Rams. The Rams won the game 34-31 but they did not cover the 10.0 point spread.

For the season, against the spread, Zoltar is 47-30 (~61% accuracy).

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting. In the week #19 wildcard games, just predicting the winning team, Zoltar went 3-1 with 2 games too close to call. Vegas had an excellent week, going 5-1 at just predicting the winning team.

I speculate that in the playoffs, bettors start wagering with their hearts instead of the rational part of their brains.



My system is named after the Zoltar fortune teller machine you can find in arcades. I’ve always been fascinated by mechanical fortune tellers. Artist Thomas Kuntz makes incredible automata, including this “L’Oracle du Mort”. The user inserts a small card, with a question like “How rich will I be?” into a tray at the front of the machine. The magician / oracle figure comes to life and consults his crystal ball, lights a flame, and opens the circular door held by the skeleton to reveal the answer. See the machine in action at youtube.com/watch?v=GI1LnFUSejU. The machine is currently owned by Oscar-winning director Guillermo del Toro.


Posted in Zoltar | Leave a comment

Linear Regression with Closed Form Pseudo-Inverse Training Via SVD Using C#

The form of a linear regression prediction equation is y’ = (w0 * x0) + (w1 * x1) + . . + (wn * xn) + b where y’ is the predicted value, the xi are predictor values, the wi are constants called model weights, and b is a constant called the bias. For example, y’ = predicted balance = (-0.54 * age) + (0.38 * height) + (0.11 * experience) + 0.72.

Training the model is the process of finding the values of the weights and bias so that predicted y values are close to known correct target y values in a set of training data.

There are several ways to train a linear regression model. Two of the most common techniques are stochastic gradient descent (SGD) training, and closed form training using matrix pseudo-inverse. This blog post demonstrates how to implement the closed form training technique, using the singular value decomposition (SVD) algorithm, implemented using the Jacobi technique.

Briefly, the math equation is:

w = pinv(DX) * y

Here w is a vector of the weights and the bias that you want. The pinv() function is the Moore-Penrose pseudo-inverse of a matrix. DX is a design matrix of the X predictors of the training data. The * is matrix-to-vector multiplication, and y is a vector of target y values of the training data.

A design matrix takes the bias into account by adding a leading column of 1.0 values. For example, if a set of training predictors stored in a matrix X is:

0.50  0.77  0.32
0.92  0.41  0.84
0.64  0.73  0.43

The associated design matrix is:

1.0  0.50  0.77  0.32
1.0  0.92  0.41  0.84
1.0  0.64  0.73  0.43

For a demo, I used one of my standard sets of synthetic data that looks like:

-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
. . . 

The first five values on each line are the predictors (sometimes called features). The last value on each line is the target y value to predict. There are 200 training items and 40 test items.

The output of my demo is:

Begin C# linear regression using pseudo-inverse
closed-form training

Loading synthetic train (200) and test (40) data
Done

First three train X:
 -0.1660  0.4406 -0.9998 -0.3953 -0.7065
  0.0776 -0.1616  0.3704 -0.5911  0.7562
 -0.9452  0.3409 -0.1654  0.1174 -0.7192

First three train y:
  0.4840
  0.1568
  0.8054

Creating and training Linear Regression model
using p-inverse
Done

Coefficients/weights:
-0.2656  0.0333  -0.0454  0.0358  -0.1146
Bias/constant: 0.3619

Evaluating model

Accuracy train (within 0.10) = 0.4600
Accuracy test (within 0.10) = 0.6500

MSE train = 0.0026
MSE test = 0.0020

Predicting for x =
  -0.1660   0.4406  -0.9998  -0.3953  -0.7065

Predicted y = 0.5329

End demo

The accuracy of the trained model is poor because the synthetic data has complex, non-linear relationships. Linear regression doesn’t always work well, but it’s usually a good place to start because at the very least it gives you a baseline result for comparison with more sophisticated techniques such as neural network regression.

Implementing a pseudo-inverse function is very difficult. Many mathematicians worked for many years on the problem, and came up with many solutions. The one used by the demo program is to compute the pseudo-inverse of a matrix using singular value decomposition (SVD) via the Jacobi algorithm. Other pseudo-inverse techniques include SVD with QR, direct QR via Householder, direct QR via modified Gram-Schmidt, direct QR via Givens.

Note: instead of using explicit pseudo-inverse training, it’s possible to mimic a pseudo-inverse function using regular matrix inverse. That form is w = inv(Xt * X) * Xt * y, where X is the design matrix, Xt is the transpose of X, inv() is regular matrix inverse, * is matrix multiplication, and y is a vector of target values from the training data. This technique is sometimes called pseudo-inverse via normal equations. Because Xt * X is square symmetric positive definite, it’s inverse can be computed using the special Cholesky decomposition.



Machine learning is a combination of math techniques that are hundreds of years old, and new ideas that have been developed recently. Here’s a somewhat-random image of a past+future steampunk automobile — very nice.


Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols — my blog editor chokes on symbols.

using System;
using System.IO;
using System.Collections.Generic;

namespace LinearRegressionPseudoInverse
{
  internal class LinearRegressionProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin C# linear regression" +
        " using pseudo-inverse (SVD-Jacobi) closed-form" +
        " training ");

      // 1. load data
      Console.WriteLine("\nLoading synthetic train" +
        " (200) and test (40) data");
      string trainFile =
        "..\\..\\..\\Data\\synthetic_train_200.txt";
      int[] colsX = new int[] { 0, 1, 2, 3, 4 };
      double[][] trainX =
        MatLoad(trainFile, colsX, ',', "#");
      double[] trainY =
        MatToVec(MatLoad(trainFile,
        new int[] { 5 }, ',', "#"));

      string testFile =
        "..\\..\\..\\Data\\synthetic_test_40.txt";
      double[][] testX =
         MatLoad(testFile, colsX, ',', "#");
      double[] testY =
        MatToVec(MatLoad(testFile,
        new int[] { 5 }, ',', "#"));
      Console.WriteLine("Done ");

      Console.WriteLine("\nFirst three train X: ");
      for (int i = 0; i "lt" 3; ++i)
        VecShow(trainX[i], 4, 8);

      Console.WriteLine("\nFirst three train y: ");
      for (int i = 0; i "lt" 3; ++i)
        Console.WriteLine(trainY[i].ToString("F4").
          PadLeft(8));

      // 2. create and train model using pseudo-inverse
      Console.WriteLine("\nCreating and training" +
        " Linear Regression model using p-inverse ");
      LinearRegressor model = new LinearRegressor();
      model.Train(trainX, trainY);
      Console.WriteLine("Done ");

      // 2b. show model parameters
      Console.WriteLine("\nCoefficients/weights: ");
      for (int i = 0; i "lt" model.weights.Length; ++i)
        Console.Write(model.weights[i].ToString("F4") + "  ");
      Console.WriteLine("\nBias/constant: " +
        model.bias.ToString("F4"));

      // 3. evaluate model
      Console.WriteLine("\nEvaluating model ");
      double accTrain = model.Accuracy(trainX, trainY, 0.10);
      Console.WriteLine("\nAccuracy train (within 0.10) = " +
        accTrain.ToString("F4"));
      double accTest = model.Accuracy(testX, testY, 0.10);
      Console.WriteLine("Accuracy test (within 0.10) = " +
        accTest.ToString("F4"));

      double mseTrain = model.MSE(trainX, trainY);
      Console.WriteLine("\nMSE train = " +
        mseTrain.ToString("F4"));
      double mseTest = model.MSE(testX, testY);
      Console.WriteLine("MSE test = " +
        mseTest.ToString("F4"));

      // 4. use model to predict first training item
      double[] x = trainX[0];
      Console.WriteLine("\nPredicting for x = ");
      VecShow(x, 4, 9);
      double predY = model.Predict(x);
      Console.WriteLine("\nPredicted y = " +
        predY.ToString("F4"));

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();
    } // Main()

    // ------------------------------------------------------
    // helpers for Main(): MatLoad(), MatToVec(), VecShow()
    // ------------------------------------------------------

    static double[][] MatLoad(string fn, int[] usecols,
      char sep, string comment)
    {
      List"lt"double[]"gt" result = 
        new List"lt"double[]"gt"();
      string line = "";
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment) == true)
          continue;
        string[] tokens = line.Split(sep);
        List"lt"double"gt" lst = new List"lt"double"gt"();
        for (int j = 0; j "lt" usecols.Length; ++j)
          lst.Add(double.Parse(tokens[usecols[j]]));
        double[] row = lst.ToArray();
        result.Add(row);
      }
      sr.Close(); ifs.Close();
      return result.ToArray();
    }

    static double[] MatToVec(double[][] mat)
    {
      int nRows = mat.Length;
      int nCols = mat[0].Length;
      double[] result = new double[nRows * nCols];
      int k = 0;
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          result[k++] = mat[i][j];
      return result;
    }

    static void VecShow(double[] vec, int dec, int wid)
    {
      for (int i = 0; i "lt" vec.Length; ++i)
        Console.Write(vec[i].ToString("F" + dec).
          PadLeft(wid));
      Console.WriteLine("");
    }

  } // class Program

  // ========================================================

  public class LinearRegressor
  {
    public double[] weights;
    public double bias;
    private Random rnd;

    // ------------------------------------------------------

    public LinearRegressor(int seed = 0)  // ctor
    {
      this.weights = new double[0];
      this.bias = 0;
      this.rnd = new Random(seed); // no need this version
    }

    // ------------------------------------------------------

    public double Predict(double[] x)
    {
      double result = 0.0;
      for (int j = 0; j "lt" x.Length; ++j)
        result += x[j] * this.weights[j];
      result += this.bias;
      return result;
    }

    // ------------------------------------------------------

    public void Train(double[][] trainX, double[] trainY)
    {
      // closed-form w = pinv(designX) * y
      int dim = trainX[0].Length;
      this.weights = new double[dim];

      double[][] X = MatToDesign(trainX);  // design X
      double[][] Xpinv = MatPinv(X);
      double[] biasAndWts = MatVecProd(Xpinv, trainY);
      this.bias = biasAndWts[0];
      for (int i = 1; i "lt" biasAndWts.Length; ++i)
        this.weights[i - 1] = biasAndWts[i];
      return;
    }

    // ------------------------------------------------------

    public double Accuracy(double[][] dataX, double[] dataY,
      double pctClose)
    {
      int numCorrect = 0; int numWrong = 0;
      for (int i = 0; i "lt" dataX.Length; ++i)
      {
        double actualY = dataY[i];
        double predY = this.Predict(dataX[i]);
        if (Math.Abs(predY - actualY) "lt"
          Math.Abs(pctClose * actualY))
          ++numCorrect;
        else
          ++numWrong;
      }
      return (numCorrect * 1.0) / (numWrong + numCorrect);
    }

    // ------------------------------------------------------

    public double MSE(double[][] dataX, double[] dataY)
    {
      int n = dataX.Length;
      double sum = 0.0;
      for (int i = 0; i "lt" n; ++i)
      {
        double actualY = dataY[i];
        double predY = this.Predict(dataX[i]);
        sum += (actualY - predY) * (actualY - predY);
      }
      return sum / n;
    }

    // ------------------------------------------------------
    //
    // helpers: MatPinv, MatMake, MatTranspose, MatProduct
    // MatDecompSVD, MatVecProd, MatToDesign
    // 
    // ------------------------------------------------------

    private static double[][] MatPinv(double[][] M)
    {
      // Moore-Penrose pseudo-inverse from SVD (Jacobi)
      double[][] U; double[] s; double[][] Vh;

      MatDecompSVD(M, out U, out s, out Vh);

      double[][] Sinv = MatMake(s.Length, s.Length);
      for (int i = 0; i "lt" Sinv.Length; ++i)
        Sinv[i][i] = 1.0 / (s[i] + 1.0e-12); // avoid 0

      // pinv = V * Sinv * Uh
      double[][] V = MatTranspose(Vh);
      double[][] Uh = MatTranspose(U);
      double[][] tmp = MatProduct(Sinv, Uh);
      double[][] result = MatProduct(V, tmp);

      return result;
    } // MatPinv()

    // ------------------------------------------------------

    private static void MatDecompSVD(double[][] M,
      out double[][] U, out double[] s, out double[][] Vh)
    {
      // Jacobi algorithm
      // see github.com/ampl/gsl/blob/master/linalg/svd.c 
      // "full_matrices=False" version
      double DBL_EPSILON = 1.0e-15;
      int m = M.Length;  // nRows
      int n = M[0].Length;  // nCols

      // A = MatCopy(M); 
      double[][] A = new double[m][];  // working U
      for (int i = 0; i "lt" m; ++i)
        A[i] = new double[n];
      for (int i = 0; i "lt" m; ++i)
        for (int j = 0; j "lt" n; ++j)
          A[i][j] = M[i][j];

      // Q = MatIdentity(n);
      double[][] Q = new double[n][];  // working V
      for (int i = 0; i "lt" n; ++i)
        Q[i] = new double[n];
      for (int i = 0; i "lt" n; ++i)
        Q[i][i] = 1.0;
      //
      double[] t = new double[n];  // working s

      // initialize counters
      int count = 1;
      int sweep = 0;
      double tol = 10 * m * DBL_EPSILON; // heuristic

      // Always do at least 12 sweeps
      int sweepmax = Math.Max(5 * n, 12); // heuristic

      // store the column error estimates in St for use
      // during orthogonalization

      for (int j = 0; j "lt" n; ++j)
      {
        // double[] cj = MatGetColumn(A, j);
        double[] cj = new double[m];
        for (int i = 0; i "lt" m; ++i)
          cj[i] = A[i][j];

        // double sj = VecNorm(cj);
        double sj = 0.0;
        for (int i = 0; i "lt" cj.Length; ++i)
          sj += cj[i] * cj[i];
        sj = Math.Sqrt(sj);

        t[j] = DBL_EPSILON * sj;
      } // j

      // orthogonalize A by plane rotations
      while (count "gt" 0 && sweep "lte" sweepmax)
      {
        // initialize rotation counter
        count = n * (n - 1) / 2;

        for (int j = 0; j "lt" n - 1; ++j)
        {
          for (int k = j + 1; k "lt" n; ++k)
          {
            double cosine, sine;

            // double[] cj = MatGetColumn(A, j);
            double[] cj = new double[m];
            for (int i = 0; i "lt" m; ++i)
              cj[i] = A[i][j];

            // double[] ck = MatGetColumn(A, k);
            double[] ck = new double[m];
            for (int i = 0; i "lt" m; ++i)
              ck[i] = A[i][k];

            // double p = 2.0 * VecDot(cj, ck);
            double p = 0.0;
            for (int i = 0; i "lt" cj.Length; ++i)
              p += cj[i] * ck[i];
            p *= 2.0;

            // double a = VecNorm(cj);
            double a = 0.0;
            for (int i = 0; i "lt" cj.Length; ++i)
              a += cj[i] * cj[i];
            a = Math.Sqrt(a);

            //double b = VecNorm(ck);
            double b = 0.0;
            for (int i = 0; i "lt" ck.Length; ++i)
              b += ck[i] * ck[i];
            b = Math.Sqrt(b);

            double q = a * a - b * b;
            // double v = Hypot(p, q); // sqrt(x^2 + y^2)
            double v = Math.Sqrt((p * p) + (q * q));

            // test for columns j,k orthogonal,
            // or dominant errors 
            double abserr_a = t[j];
            double abserr_b = t[k];

            bool sorted = (a "gte" b);
            bool orthog = (Math.Abs(p) "lte"
              tol * (a * b));
            bool noisya = (a "lt" abserr_a);
            bool noisyb = (b "lt" abserr_b);

            if (sorted && (orthog ||
              noisya || noisyb))
            {
              --count; continue;
            }

            // calculate rotation angles
            if (v == 0 || !sorted)
            {
              cosine = 0.0; sine = 1.0;
            }
            else
            {
              cosine = Math.Sqrt((v + q) / (2.0 * v));
              sine = p / (2.0 * v * cosine);
            }

            // apply rotation to A (U)
            for (int i = 0; i "lt" m; ++i)
            {
              double Aik = A[i][k];
              double Aij = A[i][j];
              A[i][j] = Aij * cosine + Aik * sine;
              A[i][k] = -Aij * sine + Aik * cosine;
            }

            // update singular values
            t[j] = Math.Abs(cosine) * abserr_a +
              Math.Abs(sine) * abserr_b;
            t[k] = Math.Abs(sine) * abserr_a +
              Math.Abs(cosine) * abserr_b;

            // apply rotation to Q (V)
            for (int i = 0; i "lt" n; ++i)
            {
              double Qij = Q[i][j];
              double Qik = Q[i][k];
              Q[i][j] = Qij * cosine + Qik * sine;
              Q[i][k] = -Qij * sine + Qik * cosine;
            } // i
          } // k
        } // j

        ++sweep;
      } // while

      //  compute singular values
      double prevNorm = -1.0;

      for (int j = 0; j "lt" n; ++j)
      {
        // double[] column = MatGetColumn(A, j);
        double[] column = new double[m];
        for (int i = 0; i "lt" m; ++i)
          column[i] = A[i][j];

        // double norm = VecNorm(column);
        double norm = 0.0;
        for (int i = 0; i "lt" column.Length; ++i)
          norm += column[i] * column[i];
        norm = Math.Sqrt(norm);

        // determine if singular value is zero
        if (norm == 0.0 || prevNorm == 0.0
          || (j "gt" 0 &&
            norm "lte" tol * prevNorm))
        {
          t[j] = 0.0;
          for (int i = 0; i "lt" m; ++i)
            A[i][j] = 0.0;
          prevNorm = 0.0;
        }
        else
        {
          t[j] = norm;
          for (int i = 0; i "lt" m; ++i)
            A[i][j] = A[i][j] * 1.0 / norm;
          prevNorm = norm;
        }
      }

      if (count "gt" 0)
      {
        Console.WriteLine("Jacobi iterations did not" +
          " converge");
      }

      // U = A;
      U = new double[m][];
      for (int i = 0; i "lt" m; ++i)
        U[i] = new double[n];
      for (int i = 0; i "lt" m; ++i)
        for (int j = 0; j "lt" n; ++j)
          U[i][j] = A[i][j];

      // s = t;
      s = new double[n];
      for (int i = 0; i "lt" n; ++i)
        s[i] = t[i];

      // Vh = MatTranspose(Q);  // Q is nxn
      Vh = new double[n][];
      for (int i = 0; i "lt" n; ++i)
        Vh[i] = new double[n];
      for (int i = 0; i "lt" n; ++i)
        for (int j = 0; j "lt" n; ++j)
          Vh[i][j] = Q[j][i];

      // to sync with np.linalg.svd() shapes
      // for full_matrices=False (not the default)
      // if m "lt" n,
      //   extract 1st m columns of U
      //   extract 1st m values of s
      //   extract 1st m rows of Vh

      if (m "lt" n)
      {
        // U = MatExtractFirstColumns(U, m);
        double[][] newU = new double[U.Length][];  // all rows
        for (int i = 0; i "lt" U.Length; ++i)
          newU[i] = new double[m]; // just m cols
        for (int i = 0; i "lt" m; ++i)
          for (int j = 0; j "lt" m; ++j)  // m cols
            newU[i][j] = U[i][j];
        U = newU;

        // s = VecExtractFirst(s, m);
        double[] newS = new double[m];  // just m vals
        for (int i = 0; i "lt" m; ++i)
          newS[i] = s[i];
        s = newS;

        // Vh = MatExtractFirstRows(Vh, m);
        double[][] newVh = new double[m][]; // just m rows
        for (int i = 0; i "lt" m; ++i)
          newVh[i] = new double[Vh[0].Length]; // all cols
        for (int i = 0; i "lt" m; ++i)
          for (int j = 0; j "lt" Vh[0].Length; ++j)
            newVh[i][j] = Vh[i][j];
        Vh = newVh;
      } // m "lt" n

    } // MatDecompSVD()

    private static double[] MatVecProd(double[][] M,
      double[] v)
    {
      // return a regular vector
      int nRows = M.Length;
      int nCols = M[0].Length;
      int n = v.Length;
      if (nCols != n)
        throw new Exception("non-comform in MatVecProd");

      double[] result = new double[nRows];
      for (int i = 0; i "lt" nRows; ++i)
        for (int k = 0; k "lt" nCols; ++k)
          result[i] += M[i][k] * v[k];

      return result;
    }

    private static double[][] MatToDesign(double[][] M)
    {
      // add a column of 1s
      int nRows = M.Length;
      int nCols = M[0].Length;
      double[][] result = new double[M.Length][];
      for (int i = 0; i "lt" nRows; ++i)
        result[i] = new double[nCols + 1];

      for (int i = 0; i "lt" nRows; ++i)
      {
        result[i][0] = 1.0;
        for (int j = 1; j "lt" nCols + 1; ++j)
          result[i][j] = M[i][j - 1];
      }

      return result;
    }

    private static double[][] MatMake(int nRows, int nCols)
    {
      double[][] result = new double[nRows][];
      for (int i = 0; i "lt" nRows; ++i)
        result[i] = new double[nCols];
      return result;
    }

    private static double[][] MatTranspose(double[][] M)
    {
      int nRows = M.Length;
      int nCols = M[0].Length;
      double[][] result = MatMake(nCols, nRows);
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          result[j][i] = M[i][j];
      return result;
    }

    private static double[][] MatProduct(double[][] A,
      double[][] B)
    {
      int aRows = A.Length;
      int aCols = A[0].Length;
      int bRows = B.Length;
      int bCols = B[0].Length;
      if (aCols != bRows)
        throw new Exception("Non-conformable matrices");

      double[][] result = MatMake(aRows, bCols);

      for (int i = 0; i "lt" aRows; ++i)
        for (int j = 0; j "lt" bCols; ++j)
          for (int k = 0; k "lt" aCols; ++k)
            result[i][j] += A[i][k] * B[k][j];

      return result;
    }
  } // class LinearRegressor

} // ns

Training data:

# synthetic_train_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402

Test data:

# synthetic_test_40.txt
#
 0.7462,  0.4006, -0.0590,  0.6543, -0.0083,  0.1935
 0.8495, -0.2260, -0.0142, -0.4911,  0.7699,  0.1078
-0.2335, -0.4049,  0.4352, -0.6183, -0.7636,  0.5088
 0.1810, -0.5142,  0.2465,  0.2767, -0.3449,  0.3136
-0.8650,  0.7611, -0.0801,  0.5277, -0.4922,  0.7140
-0.2358, -0.7466, -0.5115, -0.8413, -0.3943,  0.4533
 0.4834,  0.2300,  0.3448, -0.9832,  0.3568,  0.1360
-0.6502, -0.6300,  0.6885,  0.9652,  0.8275,  0.3046
-0.3053,  0.5604,  0.0929,  0.6329, -0.0325,  0.4756
-0.7995,  0.0740, -0.2680,  0.2086,  0.9176,  0.4565
-0.2144, -0.2141,  0.5813,  0.2902, -0.2122,  0.4119
-0.7278, -0.0987, -0.3312, -0.5641,  0.8515,  0.4438
 0.3793,  0.1976,  0.4933,  0.0839,  0.4011,  0.1905
-0.8568,  0.9573, -0.5272,  0.3212, -0.8207,  0.7415
-0.5785,  0.0056, -0.7901, -0.2223,  0.0760,  0.5551
 0.0735, -0.2188,  0.3925,  0.3570,  0.3746,  0.2191
 0.1230, -0.2838,  0.2262,  0.8715,  0.1938,  0.2878
 0.4792, -0.9248,  0.5295,  0.0366, -0.9894,  0.3149
-0.4456,  0.0697,  0.5359, -0.8938,  0.0981,  0.3879
 0.8629, -0.8505, -0.4464,  0.8385,  0.5300,  0.1769
 0.1995,  0.6659,  0.7921,  0.9454,  0.9970,  0.2330
-0.0249, -0.3066, -0.2927, -0.4923,  0.8220,  0.2437
 0.4513, -0.9481, -0.0770, -0.4374, -0.9421,  0.2879
-0.3405,  0.5931, -0.3507, -0.3842,  0.8562,  0.3987
 0.9538,  0.0471,  0.9039,  0.7760,  0.0361,  0.1706
-0.0887,  0.2104,  0.9808,  0.5478, -0.3314,  0.4128
-0.8220, -0.6302,  0.0537, -0.1658,  0.6013,  0.4306
-0.4123, -0.2880,  0.9074, -0.0461, -0.4435,  0.5144
 0.0060,  0.2867, -0.7775,  0.5161,  0.7039,  0.3599
-0.7968, -0.5484,  0.9426, -0.4308,  0.8148,  0.2979
 0.7811,  0.8450, -0.6877,  0.7594,  0.2640,  0.2362
-0.6802, -0.1113, -0.8325, -0.6694, -0.6056,  0.6544
 0.3821,  0.1476,  0.7466, -0.5107,  0.2592,  0.1648
 0.7265,  0.9683, -0.9803, -0.4943, -0.5523,  0.2454
-0.9049, -0.9797, -0.0196, -0.9090, -0.4433,  0.6447
-0.4607,  0.1811, -0.2389,  0.4050, -0.0078,  0.5229
 0.2664, -0.2932, -0.4259, -0.7336,  0.8742,  0.1834
-0.4507,  0.1029, -0.6294, -0.1158, -0.6294,  0.6081
 0.8948, -0.0124,  0.9278,  0.2899, -0.0314,  0.1534
-0.1323, -0.8813, -0.0146, -0.0697,  0.6135,  0.2386
Posted in Machine Learning | Leave a comment

Gradient Boosting Regression From Scratch Using Python

Gradient boosting regression (GBR) is a common machine learning technique to predict a single numeric value. The technique combines many small decision trees (typically 100 or 200) to produce a final prediction. One morning before work, I figured I’d put together a quick demo of GBR from scratch using Python. By “scratch” I mean the both the top-level MyGradientBoostingRegressor class and the helper MyDecisionTreeRegressor class are implemented without any external dependencies.

My implementation is based mostly on some Google search AI-generated code. That AI code appears to be closely related to an old blog post by a guy named Tomonori Masui, and his blog post appears to be closely related to a now-vanished blog post by a guy named Matt Bowers.

Gradient boosting is simple but very subtle. As each tree in the collection is constructed, it predicts the residuals of the previous tree. A residual is the difference between actual y and predicted y. Mathematically, a residual is (almost) a gradient, hence the name of the technique. For a given input, a prediction starts out as the average of the target y values. The predicted residuals of each tree — some will be negative and some will be positive — are accumulated, and the final sum is the predicted y value. The idea is very subtle.

For my demo, I used a set of synthetic data that I generated using a neural network with random weights and biases. The data looks like:

-0.1660,  0.4406, -0.9998, -0.3953, -0.7065, 0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562, 0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192, 0.8054
. . .

The first five values on each line are the predictors. The sixth value is the target to predict. All predictor values are between -1.0 and 1.0. Normalizing the predictor values is not necessary but is helpful when using the data with other regression techniques that require normalization (such as k-nearest neighbors regression). There are 200 items in the training data and 40 items in the test data.

The output of the from-scratch gradient boosting regression demo program is:

Begin gradient boost regression from scratch

Loading synthetic train (200), test (40) data
Done

First three X predictors:
[-0.1660  0.4406 -0.9998 -0.3953 -0.7065]
[ 0.0776 -0.1616  0.3704 -0.5911  0.7562]
[-0.9452  0.3409 -0.1654  0.1174 -0.7192]

First three y targets:
0.4840
0.1568
0.8054

Setting n_estimators = 200
Setting max_depth = 3
Setting min_samples = 2
Setting lrn_rate = 0.0500

Training model
Done

Accuracy train (within 0.10): 0.9850
Accuracy test (within 0.10): 0.7000

MSE train: 0.0001
MSE test: 0.0011

Predicting for:
[[-0.1660  0.4406 -0.9998 -0.3953 -0.7065]]
Predicted y = 0.4862
==============================

Using scikit GradientBoostingRegressor:

Accuracy train (within 0.10): 0.9850
Accuracy test (within 0.10): 0.6500

MSE train: 0.0001
MSE test: 0.0012
Predicting for:
[[-0.1660  0.4406 -0.9998 -0.3953 -0.7065]]
Predicted y = 0.4862

End demo

I implemented an accuracy() function that scores a prediction as correct if it’s within a specified percentage of the true target y value. I specified 10% as an arbitrary percentage to use.

I ran the synthetic data through the built-in scikit GradientBoostingRegressor module. Using the same model parameters — n_estimators = 200, max_depth = 3, min_samples = 2, lrn_rate = 0.0500 — the built-in scikit module produced very similar results. (The difference is due to the way the underlying decision trees work when computing splits).

There are several advanced variations of gradient boosting regression libraries available. XGBoost (“Extreme Gradient Boost”) and LightGBM are two I’ve used.

Good fun and a good way to start my day.



I’ve always been fascinated by mathematical predictive models. I stumbled across a person who made a fantastic 1/12 scale model of the 221B Baker Street home of Sherlock Holmes. Absolutely beautiful. These images don’t do justice to the incredible detail. Click to enlarge.


Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols. (My blog editor chokes on symbols).

# gradient_boost_regression_scratch.py

import numpy as np
# from sklearn.tree import DecisionTreeRegressor  # validation

# ===========================================================

class MyGradientBoostingRegressor:
  def __init__(self, lrn_rate, n_estimators,
    max_depth=2, min_samples=2, seed=0):
    self.lrn_rate = lrn_rate
    self.n_estimators = n_estimators
    self.max_depth = max_depth
    self.min_samples = min_samples
    self.trees = []
    self.pred_0 = 0.0
    self.seed = seed  # passed to base decision tree
        
  def fit(self, X, y):
    self.pred_0 = y.mean()  # initial prediction(s)
    preds = np.full(len(X), self.pred_0)
    for t in range(self.n_estimators):  # each tree
      gradients = preds - y

      # using scikit trees
      # dtr = DecisionTreeRegressor(max_depth=self.max_depth,
      #   min_samples_split=self.min_samples, random_state=0)

      # using from-scratch trees
      dtr = MyDecisionTreeRegressor(max_depth=self.max_depth,
        min_samples=self.min_samples, seed=self.seed)

      dtr.fit(X, gradients)
      gammas = dtr.predict(X)  # update vals
      preds -= self.lrn_rate * gammas
      self.trees.append(dtr)
            
  def predict(self, X):
    results = np.full(len(X), self.pred_0)
    for i in range(self.n_estimators):
      results -= self.lrn_rate * self.trees[i].predict(X)
    return results  # sum of predictions on residuals

# ===========================================================
# ===========================================================

class MyDecisionTreeRegressor:  # avoid scikit name collision
  # if max_depth = n, tree has at most 2^(n+1) - 1 nodes.

  def __init__(self, max_depth=3, min_samples=2,
    n_split_cols=-1, seed=0):
    self.max_depth = max_depth
    self.min_samples = min_samples # aka min_samples_split
    self.n_split_cols = n_split_cols  # mostly random forest
    self.root = None
    self.rnd = np.random.RandomState(seed) # split col order

  # ===============================================

  class Node:
    def __init__(self, id=0, col_idx=-1, thresh=0.0,
        left=None, right=None, value=0.0, is_leaf=False):
      self.id = id  # useful for debugging
      self.col_idx = col_idx
      self.thresh = thresh
      self.left = left
      self.right = right
      self.value = value
      self.is_leaf = is_leaf  # False for an in-node

  # ===============================================

  def best_split(self, X, y):
    best_col_idx = -1  # indicates a bad split
    best_thresh = 0.0
    best_mse = np.inf  # smaller is better
    n_rows, n_cols = X.shape

    rnd_cols = np.arange(n_cols)
    self.rnd.shuffle(rnd_cols)
    if self.n_split_cols != -1:  # just use some cols
      rnd_cols = rnd_cols[0:self.n_split_cols]

    for j in range(len(rnd_cols)):
      col_idx = rnd_cols[j]
      examined_threshs = set()
      for i in range(n_rows):
        thresh = X[i][col_idx]  # candidate threshold value

        if thresh in examined_threshs == True:
          continue
        examined_threshs.add(thresh)

        # get rows where x is lte, gt thresh
        left_idxs = np.where(X[:,col_idx] "lte" thresh)[0]
        right_idxs = np.where(X[:,col_idx] "gt" thresh)[0]

        # check proposed split
        if len(left_idxs) == 0 or \
          len(right_idxs) == 0:
          continue

        # get left and right y values
        left_y_vals = y[left_idxs]  # not empty
        right_y_vals = y[right_idxs]  # not empty

        # compute proposed split MSE
        mse_left = self.vector_mse(left_y_vals)
        mse_right = self.vector_mse(right_y_vals)
        split_mse = (len(left_y_vals) * mse_left + \
          len(right_y_vals) * mse_right) / n_rows

        if split_mse "lt" best_mse:
          best_col_idx = col_idx
          best_thresh = thresh
          best_mse = split_mse          

    return best_col_idx, best_thresh  # -1 is bad/no split

  # ---------------------------------------------------------

  def vector_mse(self, y):  # variance but called MSE
    if len(y) == 0:
      return 0.0  # should never get here
    return np.var(y)

  # ---------------------------------------------------------

  def make_tree(self, X, y):
    root = self.Node()  # is_leaf is False
    stack = [(root, X, y, 0)]  # curr depth = 0

    while (len(stack) "gt" 0):
      curr_node, curr_X, curr_y, curr_depth = stack.pop()

      if curr_depth == self.max_depth or \
        len(curr_y) "lt" self.min_samples:
        curr_node.value = np.mean(curr_y)
        curr_node.is_leaf = True
        continue

      col_idx, thresh = self.best_split(curr_X, curr_y) 

      if col_idx == -1:  # cannot split
        curr_node.value = np.mean(curr_y)
        curr_node.is_leaf = True
        continue
      
      # got a good split so at an internal, non-leaf node
      curr_node.col_idx = col_idx
      curr_node.thresh = thresh

      # create and attach child nodes
      left_idxs = np.where(curr_X[:,col_idx] "lte" thresh)[0]
      right_idxs = np.where(curr_X[:,col_idx] "gt" thresh)[0]

      left_X = curr_X[left_idxs,:]
      left_y = curr_y[left_idxs]
      right_X = curr_X[right_idxs,:]
      right_y = curr_y[right_idxs]

      curr_node.left = self.Node(id=2*curr_node.id+1)
      stack.append((curr_node.left,
        left_X, left_y, curr_depth+1))

      curr_node.right = self.Node(id=2*curr_node.id+2)
      stack.append((curr_node.right,
        right_X, right_y, curr_depth+1))
      
    return root

  # ---------------------------------------------------------

  def fit(self, X, y):
    self.root = self.make_tree(X, y)

  # ---------------------------------------------------------

  def predict_one(self, x):
    curr = self.root
    while curr.is_leaf == False:
      if x[curr.col_idx] "lte" curr.thresh:
        curr = curr.left
      else:
        curr = curr.right
    return curr.value

  def predict(self, X):  # scikit always uses a matrix input
    result = np.zeros(len(X), dtype=np.float64)
    for i in range(len(X)):
      result[i] = self.predict_one(X[i])
    return result

  # ---------------------------------------------------------

# ===========================================================
# ===========================================================

# -----------------------------------------------------------

def accuracy(model, data_X, data_y, pct_close):
  n = len(data_X)
  n_correct = 0; n_wrong = 0
  for i in range(n):
    x = data_X[i].reshape(1,-1)
    y = data_y[i]
    y_pred = model.predict(x)

    if np.abs(y - y_pred) "lt" np.abs(y * pct_close):
      n_correct += 1
    else: 
      n_wrong += 1
  # print("Correct = " + str(n_correct))
  # print("Wrong   = " + str(n_wrong))
  return n_correct / (n_correct + n_wrong)

# -----------------------------------------------------------

def MSE(model, data_X, data_y):
  n = len(data_X)
  sum = 0.0
  for i in range(n):
    x = data_X[i].reshape(1,-1)
    y = data_y[i]
    y_pred = model.predict(x)
    sum += (y - y_pred) * (y - y_pred)

  return sum / n

# -----------------------------------------------------------

def main():
  print("\nBegin gradient boost regression from scratch ")
  np.set_printoptions(precision=4, suppress=True,
    floatmode='fixed')
  np.random.seed(0)  # not used this version

  # 1. load data
  print("\nLoading synthetic train (200), test (40) data ")
  train_file = ".\\Data\\synthetic_train_200.txt"
  # -0.1660,0.4406,-0.9998,-0.3953,-0.7065,0.4840
  #  0.0776,-0.1616,0.3704,-0.5911,0.7562,0.1568
  # -0.9452,0.3409,-0.1654,0.1174,-0.7192,0.8054
  # . . .

  train_X = np.loadtxt(train_file, comments="#",
    usecols=[0,1,2,3,4],
    delimiter=",",  dtype=np.float64)
  train_y = np.loadtxt(train_file, comments="#", usecols=5,
    delimiter=",",  dtype=np.float64)

  test_file = ".\\Data\\synthetic_test_40.txt"
  test_X = np.loadtxt(test_file, comments="#",
    usecols=[0,1,2,3,4],
    delimiter=",",  dtype=np.float64)
  test_y = np.loadtxt(test_file, comments="#", usecols=5,
    delimiter=",",  dtype=np.float64)
  print("Done ")

  print("\nFirst three X predictors: ")
  for i in range(3):
    print(train_X[i])
  print("\nFirst three y targets: ")
  for i in range(3):
    print("%0.4f" % train_y[i])

  # 2. create and train model
  n_estimators = 200
  max_depth = 3
  min_samples = 2
  lrn_rate = 0.05

  print("\nSetting n_estimators = " + str(n_estimators))
  print("Setting max_depth = " + str(max_depth))
  print("Setting min_samples = " + str(min_samples))
  print("Setting lrn_rate = %0.4f " % lrn_rate)
  print("\nTraining model ")

  model = MyGradientBoostingRegressor(lrn_rate,
    n_estimators, max_depth, min_samples) 
  model.fit(train_X, train_y)
  print("Done ")

  # 3. evaluate model
  acc_train = accuracy(model, train_X, train_y, 0.10)
  print("\nAccuracy train (within 0.10): %0.4f " % acc_train)
  acc_test = accuracy(model, test_X, test_y, 0.10)
  print("Accuracy test (within 0.10): %0.4f " % acc_test)

  mse_train = MSE(model, train_X, train_y)
  print("\nMSE train: %0.4f " % mse_train)
  mse_test = MSE(model, test_X, test_y)
  print("MSE test: %0.4f " % mse_test)

  # 4. use model
  x = train_X[0].reshape(1,-1)
  print("\nPredicting for: ")
  print(x)
  y_pred = model.predict(x)
  print("Predicted y = %0.4f " % y_pred)

  print("\n============================== ")

  print("\nUsing scikit GradientBoostingRegressor: ")
  from sklearn.ensemble import GradientBoostingRegressor
  sk_gbm = GradientBoostingRegressor(n_estimators=200, 
    learning_rate=0.05, max_depth=max_depth,
    min_samples_split=min_samples, random_state=0)
  sk_gbm.fit(train_X, train_y)

  acc_train = accuracy(sk_gbm, train_X, train_y, 0.10)
  print("\nAccuracy train (within 0.10): %0.4f " % acc_train)
  acc_test = accuracy(sk_gbm, test_X, test_y, 0.10)
  print("Accuracy test (within 0.10): %0.4f " % acc_test)

  mse_train = MSE(sk_gbm, train_X, train_y)
  print("\nMSE train: %0.4f " % mse_train)
  mse_test = MSE(sk_gbm, test_X, test_y)
  print("MSE test: %0.4f " % mse_test)

  print("\nPredicting for: ")
  print(x)
  y_pred = sk_gbm.predict(x)
  print("Predicted y = %0.4f " % y_pred)

  print("\nEnd demo ")

if __name__ == "__main__":
  main()

Training data:

# synthetic_train_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402

Test data:

# synthetic_test_40.txt
#
 0.7462,  0.4006, -0.0590,  0.6543, -0.0083,  0.1935
 0.8495, -0.2260, -0.0142, -0.4911,  0.7699,  0.1078
-0.2335, -0.4049,  0.4352, -0.6183, -0.7636,  0.5088
 0.1810, -0.5142,  0.2465,  0.2767, -0.3449,  0.3136
-0.8650,  0.7611, -0.0801,  0.5277, -0.4922,  0.7140
-0.2358, -0.7466, -0.5115, -0.8413, -0.3943,  0.4533
 0.4834,  0.2300,  0.3448, -0.9832,  0.3568,  0.1360
-0.6502, -0.6300,  0.6885,  0.9652,  0.8275,  0.3046
-0.3053,  0.5604,  0.0929,  0.6329, -0.0325,  0.4756
-0.7995,  0.0740, -0.2680,  0.2086,  0.9176,  0.4565
-0.2144, -0.2141,  0.5813,  0.2902, -0.2122,  0.4119
-0.7278, -0.0987, -0.3312, -0.5641,  0.8515,  0.4438
 0.3793,  0.1976,  0.4933,  0.0839,  0.4011,  0.1905
-0.8568,  0.9573, -0.5272,  0.3212, -0.8207,  0.7415
-0.5785,  0.0056, -0.7901, -0.2223,  0.0760,  0.5551
 0.0735, -0.2188,  0.3925,  0.3570,  0.3746,  0.2191
 0.1230, -0.2838,  0.2262,  0.8715,  0.1938,  0.2878
 0.4792, -0.9248,  0.5295,  0.0366, -0.9894,  0.3149
-0.4456,  0.0697,  0.5359, -0.8938,  0.0981,  0.3879
 0.8629, -0.8505, -0.4464,  0.8385,  0.5300,  0.1769
 0.1995,  0.6659,  0.7921,  0.9454,  0.9970,  0.2330
-0.0249, -0.3066, -0.2927, -0.4923,  0.8220,  0.2437
 0.4513, -0.9481, -0.0770, -0.4374, -0.9421,  0.2879
-0.3405,  0.5931, -0.3507, -0.3842,  0.8562,  0.3987
 0.9538,  0.0471,  0.9039,  0.7760,  0.0361,  0.1706
-0.0887,  0.2104,  0.9808,  0.5478, -0.3314,  0.4128
-0.8220, -0.6302,  0.0537, -0.1658,  0.6013,  0.4306
-0.4123, -0.2880,  0.9074, -0.0461, -0.4435,  0.5144
 0.0060,  0.2867, -0.7775,  0.5161,  0.7039,  0.3599
-0.7968, -0.5484,  0.9426, -0.4308,  0.8148,  0.2979
 0.7811,  0.8450, -0.6877,  0.7594,  0.2640,  0.2362
-0.6802, -0.1113, -0.8325, -0.6694, -0.6056,  0.6544
 0.3821,  0.1476,  0.7466, -0.5107,  0.2592,  0.1648
 0.7265,  0.9683, -0.9803, -0.4943, -0.5523,  0.2454
-0.9049, -0.9797, -0.0196, -0.9090, -0.4433,  0.6447
-0.4607,  0.1811, -0.2389,  0.4050, -0.0078,  0.5229
 0.2664, -0.2932, -0.4259, -0.7336,  0.8742,  0.1834
-0.4507,  0.1029, -0.6294, -0.1158, -0.6294,  0.6081
 0.8948, -0.0124,  0.9278,  0.2899, -0.0314,  0.1534
-0.1323, -0.8813, -0.0146, -0.0697,  0.6135,  0.2386
Posted in Machine Learning | Leave a comment

Computing Moore-Penrose Pseudo-Inverse Using Three Techniques: Standard Inverse, SVD Decomposition, QR Decomposition

I was sitting in my living room chair at 3:30 AM, staring at the ceiling, mulling the meaning of life. I didn’t reach any conclusions so I figured I’d put together a Python language demo of computing the Moore-Penrose pseudo-inverse using three different techniques. If A is the source matrix, then

1. pinv = inv(At * A) * At
2. pinv = V * Sinv * Ut (from SVD)
3. pinv = Rinv * Qt (from QR)

In the first technique, At is the transpose of A, and inv() is any standard matrix inverse function (there are many, such as LUP inverse or Cholesky inverse). This technique is not a true Moore-Penrose pseudo-inverse, which has many rigorous math conditions. This technique is sometimes called the pseudo-inverse using normal equations.

Because At * A is symmetric, positive definite (if you add a small conditioning constant to the diagonal), it’s inverse can be computed using the relatively simple Cholesky decomposition. NumPy does not have a dedicated Cholesky inverse function, so for my demo I used the np.linalg.inv() function, which I believe uses SVD — but I’m not entirely. sure because the documentation is a bit vague.

In the second technique, U, s, Vh is the result of SVD decomposition, and V is the transpose of Vh, Sinv is vector s converted to a diagonal matrix with elements 1/s, and Ut (or Uh) is the transpose of U.

In the third technique, Q, R is the result of QR decomposition, and Rinv is the standard matrix inverse of R (which has a nice shortcut technique), and Qt is the transpose of Q (which is the inverse of Q).

To validate the three techniques, I started by computing the pseudo-inverse using the built-in np.linalg.pinv() function.

The output of my demo is:

A =
[[1.0000 4.0000 2.0000]
 [6.0000 0.0000 3.0000]
 [7.0000 2.0000 1.0000]
 [5.0000 9.0000 8.0000]]

np.linalg pinv(A) =
[[ 0.0047  0.0370  0.1331 -0.0317]
 [ 0.1306 -0.1946  0.1310  0.0239]
 [-0.1158  0.2113 -0.2393  0.1046]]

scratch inv(At * A) * At =
[[ 0.0047  0.0370  0.1331 -0.0317]
 [ 0.1306 -0.1946  0.1310  0.0239]
 [-0.1158  0.2113 -0.2393  0.1046]]

scratch V * Sinv * Ut =
[[ 0.0047  0.0370  0.1331 -0.0317]
 [ 0.1306 -0.1946  0.1310  0.0239]
 [-0.1158  0.2113 -0.2393  0.1046]]

scratch Rinv * Qt =
[[ 0.0047  0.0370  0.1331 -0.0317]
 [ 0.1306 -0.1946  0.1310  0.0239]
 [-0.1158  0.2113 -0.2393  0.1046]]

End pseudo-inverse demo

Each of the three from-scratch techniques has pros and cons — it would take far too long to explain them. Briefly, the standard inverse is good for small matrices. For large matrices the SVD technique is most stable in theory (but I’m skeptical about that claim). For medium size matrices the QR technique is good.

Note that for the first inv(At * A) * At technique, you should add a small constant to the diagonal elements of At * A so that its inverse will always exist.

Good fun at 3:30 AM in my living room (except it’s 4:30 AM now).



To supplement this blog post, I googled about for inverse color photography and found some attractive images. Left and Center by Julia Kuzmenko. Center by Geoffrey Jones.

I could never work in the Arts. In machine learning, success and failure are usually objective and well defined. But in the Arts, everything subjective.


Demo program.

# pinv_explore.py
# three techniques to compute Moore-Penrose pinv

import numpy as np

np.set_printoptions(precision=4, suppress=True,
  floatmode='fixed')

A = np.array([
 [1.0, 4.0, 2.0],
 [6.0, 0.0, 3.0],
 [7.0, 2.0, 1.0],
 [5.0, 9.0, 8.0]])

print("\nA = "); print(A)

def my_pinv1(A):
  # inv(At * A) * At
  At = np.transpose(A)
  tmp1 = np.matmul(At, A)
  tmp2 = np.linalg.inv(tmp1)
  tmp3 = np.matmul(tmp2, At)
  return tmp3

def my_pinv2(A):
  # V * Sinv * Ut
  U, s, Vh = np.linalg.svd(A,
    full_matrices=False)
  Sinv = np.diag(s)
  for i in range(len(s)):
    Sinv[i,i] = 1.0 / Sinv[i,i]
  Ut = np.transpose(U)
  V = np.transpose(Vh)
  tmp = np.matmul(V, Sinv)
  return np.matmul(tmp, Ut)

def my_pinv3(A):
  # Rinv * Qt
  Q, R = np.linalg.qr(A)
  Qt = np.transpose(Q)
  Rinv = np.linalg.inv(R)
  return np.matmul(Rinv, Qt)

Apinv = np.linalg.pinv(A)
print("\nnp.linalg pinv(A) = "); print(Apinv)

B = my_pinv1(A)
print("\nscratch inv(At * A) * At = "); print(B)

C = my_pinv2(A)
print("\nscratch V * Sinv * Ut = "); print(C)

D = my_pinv3(A)
print("\nscratch Rinv * Qt = "); print(D)

print("\nEnd pseudo-inverse demo ")
Posted in Machine Learning | Leave a comment

“Kernel Ridge Regression with Cholesky Inverse Training Using JavaScript” in Visual Studio Magazine

I wrote an article titled “Kernel Ridge Regression with Cholesky Inverse Training Using JavaScript” in the January 2026 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2026/01/06/kernel-ridge-regression-with-cholesky-inverse-training-using-javascript.aspx.

There are approximately a dozen common regression techniques. Examples include linear regression, nearest neighbors regression, quadratic regression, decision tree regression (several types, such as random forest and gradient boost), and neural network regression.

Kernel ridge regression uses a kernel function that computes a measure of similarity between two data items, and a ridge regularization technique to discourage model overfitting. Model overfitting occurs when a model predicts well on the training data, but predicts poorly on new, previously unseen data. Ridge regularization is also called L2 regularization.

My article presents a demo of kernel ridge regression, implemented from scratch, using the JavaScript language. The output of the demo is:

Begin Kernel Ridge Regression with Cholesky matrix inverse

Loading train (200) and test (40) from file

First three train X:
 -0.1660   0.4406  -0.9998  -0.3953  -0.7065
  0.0776  -0.1616   0.3704  -0.5911   0.7562
 -0.9452   0.3409  -0.1654   0.1174  -0.7192

First three train y:
   0.4840
   0.1568
   0.8054

Setting RBF gamma = 0.3
Setting alpha noise = 0.005

Creating and training KRR model using Cholesky inverse
Done

Model weights:
  -2.0218   -1.1406    0.0758   -0.6265    0.5722
  -0.9905    0.6912    0.4807    0.6496   -0.7364 
. . .
  -0.2014   -1.6270   -0.5825   -0.0487    1.2897

Computing model accuracy

Train acc (within 0.10) = 0.9950
Test acc (within 0.10) = 0.9500

Train MSE = 0.0000
Test MSE = 0.0002

Predicting for x =
  -0.1660    0.4406   -0.9998   -0.3953   -0.7065
Predicted y = 0.4941

End demo

The demo program uses the radial basis function (RBF) kernel function to measure the similarity between two data items. RBF values range for 1.0 (identical items) down to 0.0 (increasing dissimilarity).

There are two main ways to train a kernel ridge regression model. The first technique, and the one used by the demo program presented in this article, involves creating an n-by-n kernel matrix that compares all the training data items with each other. Then ridge regularization is applied by adding a small constant, usually named alpha, to the diagonal elements of the kernel matrix. Then the matrix inverse of the kernel matrix is computed. The inverse of the kernel matrix is multiplied by the vector of training y values, which gives the model weights.

The second technique to train a kernel ridge regression model uses stochastic gradient descent (SGD). SGD is an iterative process that loops through the training data multiple times, adjusting the model weights slowly so that the model reduces its error between predicted y values and target y values.

Computing a matrix inverse is one of the most challenging problems in numerical programming. There are over a dozen algorithms to compute a matrix inverse, and each algorithm has several variations, and each variation has multiple implementation designs.

As it turns out, because the kernel matrix for kernel ridge regression has all positive values and is symmetric, it’s possible to use a specialized matrix inverse algorithm called Cholesky decomposition. Cholesky decomposition inverse is simpler than general-purpose inverse algorithms.

My article concludes with a recap:

* Kernel ridge regression (KRR) is a machine learning technique to predict a numeric value.

* Kernel ridge regression requires a kernel function that computes a measure of similarity between two training items.

* The most common kernel function is the radial basis function (RBF).

* There are two ways to train a KRR model, kernel matrix inverse and stochastic gradient descent (SGD).

* Both training techniques require an alpha constant for ridge (aka L2) regularization to discourage model overfitting.

* For KRR matrix inverse training, you must compute the inverse of a kernel matrix of RBF applied to all pairs of training items.

* There are many techniques to compute a matrix inverse. Cholesky decomposition is a specialized, relatively simple technique that can be used for kernel matrices.



Inexplicably, one of my favorite comedy movie bits is the “bat in the house” bit. There are a surprisingly large number of movies that feature this idea.

Left: The first example that I’m aware of is “The Laurel-Hardy Murder Case” (1930). The boys are in a creepy old mansion. A bat gets in the house and under a bed sheet. The boys think it’s a ghost as it chases them around the house. Very funny movie.

Right: Another old example is “Spooks” (1953) featuring The Three Stooges. The boys are in a haunted house. A hilariously fake bat chases them around. A funny movie.


Posted in JavaScript, Machine Learning | Leave a comment

NFL 2025 Week 19 (Wildcard) Predictions – Zoltar Likes the Underdog Panthers Against the Rams

Zoltar is my NFL football prediction computer program. It uses a neural network and a type of reinforcement learning. Here are Zoltar’s predictions for week #19 (wildcard playoff games) of the 2024 season.

Zoltar:        rams  by    4  opp =    panthers    | Vegas:        rams  by   10
Zoltar:       bears  by    2  opp =     packers    | Vegas:       bears  by  1.5
Zoltar:       bills  by    0  opp =     jaguars    | Vegas:       bills  by  1.5
Zoltar:      eagles  by    6  opp = fortyniners    | Vegas:      eagles  by  3.5
Zoltar:    patriots  by    2  opp =    chargers    | Vegas:    patriots  by  3.5
Zoltar:      texans  by    0  opp =    steelers    | Vegas:      texans  by  3.5

Zoltar theoretically suggests betting when the Vegas line is “significantly” different from Zoltar’s prediction. For the beginning and end of the season, I use a relatively conservative threshold of 4.0 points difference. For the middle of the season, I use a relatively aggressive threshold of 3.0 points difference.

For week #19, five of Zoltar’s six predictions are less than 4.0 points difference from the Vegas point spreads and so Zoltar has no wagering recommendations on those games. But there’s a big discrepancy between Zoltar and Las Vegas for the Rams at Panthers game. Las Vegas picks the Rams to win by 10 points, but Zoltar thinks the Rams will win, but by only 4 points — a 6-point difference. Therefore, Zoltar suggests a wager on the underdog Panthers.

A hypothetical wager on the Panthers would win if the Panthers win by any sore, or if the favored Rams win but by less than 10.0 points. If the Rams win by exactly 10 points, the bet is a push

The annoying possibility of pushes are why point spreads often have a “point five”, such as 3.5, because then pushes aren’t possible.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better to account for overhead and things like data entry errors.

In week #18, the final regular season week, against the Vegas point spread, Zoltar went 2-0 using 4.0 points as the advice threshold.

For the regular season, against the spread, Zoltar finished with a record of 46-30 (~60% accuracy). Not bad, but not great.

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting. In week #18, just predicting the winning team, Zoltar went 10-4, with two games to close for Zoltar to make a prediction. In week #18, Vegas was 10-6 at just predicting the winning team.

For the season, just predicting the winning team, Zoltar finished 135-71 (with 66 games too close to call) for just over 65% accuracy. Vegas finished 164-107 (~60% accuracy)



My prediction system is named after the Zoltar fortune teller machine you can find in arcades. Fortune telling with a crystal ball has been around for over 2,000 years. It was described by ancient Roman Pliny the Elder as “crystallum orbis”. Almost all cultures have some form of fortune teller. Here are three results of Internet images searches, but I doubt that the costumes are historically accurate Left: “Russian fortune teller”. Center: “Chinese fortune teller”. Right: “Egyptian fortune teller”.


Posted in Zoltar | Leave a comment

An Example of Gradient Boosting Regression Using the scikit Library

The goal of a regression problem is to predict a single numeric value. Techniques include linear regression, nearest neighbors regression, kernel ridge regression, neural network regression, and decision tree regression.

Decision tree regression almost always overfits, leading to a model that predicts the training data well, but predicts new, previously unseen data poorly.

There are several ensemble techniques that use a collection of simple decision trees to try to limit overfitting. In order of complexity they are: bagging (“bootstrap aggregation”), random forest, adaptive boosting, gradient boosting (several variations including extreme gradient boosting).

I put together a demo of gradient boosting regression using the Python language scikit-learn library. There are several variations of gradient boosting, such as XGBoost (“extreme gradient boosting”) and LightGBM (“lightweight gradient boosting machine”). The scikit library version of gradient boosting regression doesn’t have a specific name.

The gradient boosting regression algorithm creates a collection of decision trees, where each tree is trained to predict the residuals (differences between actual target y and predicted target y) of the previous tree. If you can predict the difference you can predict correct value. Each tree will predict a bit better than the previous tree. A very subtle idea.

For my demo, I used a set of synthetic data that I generated using a neural network with random weights and biases. The data looks like:

-0.1660,  0.4406, -0.9998, -0.3953, -0.7065, 0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562, 0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192, 0.8054
. . .

The first five values on each line are the predictors. The sixth value is the target to predict. All predictor values are between -1.0 and 1.0. Normalizing the predictor values is not necessary but is helpful when using the data with other regression techniques that require normalization (such as k-nearest neighbors regression). There are 200 items in the training data and 40 items in the test data.

The output of the scikit AdaBoost regression demo program is:

Begin scikit Gradient Boost Regression demo

Loading synthetic train (200), test (40) data
Done

First three X predictors:
[[-0.1660  0.4406 -0.9998 -0.3953 -0.7065]
 [ 0.0776 -0.1616  0.3704 -0.5911  0.7562]
 [-0.9452  0.3409 -0.1654  0.1174 -0.7192]]

First three y targets:
0.4840
0.1568
0.8054

Setting n_estimators = 90
Setting max_depth = 5
Setting min_samples_split = 2
Setting lrn_rate = 0.0500

Training Gradient Boost Regression model
Done

Evaluating model

Accuracy train (within 0.10): 0.9900
Accuracy test (within 0.10): 0.6000

MSE train: 0.0000
MSE test: 0.0019

Predicting for x =
[-0.1660  0.4406 -0.9998 -0.3953 -0.7065]
Predicted y = 0.4852

End demo

I implemented an accuracy() function and a mean squared error function. Unfortunately, these results show the key weakness of gradient boost regression and all tree-based techniques — the model overfits significantly.

A practical weakness of the scikit GradientBoostingRegressor module is the large number of parameters:

  GradientBoostingRegressor(*, loss='squared_error',
    learning_rate=0.1, n_estimators=100, subsample=1.0,
    criterion='friedman_mse', min_samples_split=2,
    min_samples_leaf=1, min_weight_fraction_leaf=0.0,
    max_depth=3, min_impurity_decrease=0.0, init=None,
    random_state=None, max_features=None, alpha=0.9,
    verbose=0, max_leaf_nodes=None, warm_start=False,
    validation_fraction=0.1, n_iter_no_change=None,
    tol=0.0001, ccp_alpha=0.0)

There are 21 parameters! Far too many. Sometimes the designers of a library make an API too flexible. True, the parameters have sensible default values, but still, it’s a real chore to wade through the documentation and figure out which parameters must be tuned and which can be left alone.



Computer science trees are generally well-behaved. But some comic book trees and plants are not so well-behaved.

“The House of Mystery” was a popular comic book title for many years. The first issue was in December, 1950. For the next 10 years, the stories were about the supernatural and horror.

From 1960 to 1968, most of the stories were about aliens or little known super heroes. Starting in June 1968, the stories switched back to horror.

Left: Issue #204, July 1972. Center: Issue #217, September 1973. Right: Issue #240, April 1976.


Demo program. Replace the “lt” (less than) with Boolean operator symbol. (My blog editor chokes on symbols).

# scikit_gradient_boost_regression.py

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor

# -----------------------------------------------------------

def accuracy(model, data_X, data_y, pct_close):
  # assumes model has a predict(X)
  n = len(data_X)
  n_correct = 0; n_wrong = 0
  for i in range(n):
    x = data_X[i].reshape(1,-1)  # make it a matrix
    y = data_y[i]
    y_pred = model.predict(x)  # predict() expects 2D

    if np.abs(y - y_pred) "lt" np.abs(y * pct_close):
      n_correct += 1
    else: 
      n_wrong += 1
  # print("Correct = " + str(n_correct))
  # print("Wrong   = " + str(n_wrong))
  return n_correct / (n_correct + n_wrong)

# -----------------------------------------------------------

def MSE(model, data_X, data_y):
  n = len(data_X)
  sum = 0.0
  for i in range(n):
    x = data_X[i].reshape(1,-1)
    y = data_y[i]
    y_pred = model.predict(x)
    sum += (y - y_pred) * (y - y_pred)

  return sum / n

# -----------------------------------------------------------

def main():
  print("\nBegin scikit Gradient Boost Regression demo ")

  np.set_printoptions(precision=4, suppress=True,
    floatmode='fixed')
  np.random.seed(0)  # not used this version

  # 1. load data
  print("\nLoading synthetic train (200), test (40) data ")
  train_file = ".\\Data\\synthetic_train_200.txt"
  # -0.1660,0.4406,-0.9998,-0.3953,-0.7065,0.4840
  #  0.0776,-0.1616,0.3704,-0.5911,0.7562,0.1568
  # -0.9452,0.3409,-0.1654,0.1174,-0.7192,0.8054
  # . . .

  train_X = np.loadtxt(train_file, comments="#",
    usecols=[0,1,2,3,4],
    delimiter=",",  dtype=np.float64)
  train_y = np.loadtxt(train_file, comments="#", usecols=5,
    delimiter=",",  dtype=np.float64)

  test_file = ".\\Data\\synthetic_test_40.txt"
  test_X = np.loadtxt(test_file, comments="#",
    usecols=[0,1,2,3,4],
    delimiter=",",  dtype=np.float64)
  test_y = np.loadtxt(test_file, comments="#", usecols=5,
    delimiter=",",  dtype=np.float64)
  print("Done ")

  print("\nFirst three X predictors: ")
  print(train_X[0:3,:])
  print("\nFirst three y targets: ")
  for i in range(3):
    print("%0.4f" % train_y[i])

  n_estimators = 90
  max_depth = 5
  min_samples_split = 2
  lrn_rate = 0.05

  print("\nSetting n_estimators = " + \
    str(n_estimators))
  print("Setting max_depth = " + \
    str(max_depth))
  print("Setting min_samples_split = " + \
    str(min_samples_split))
  print("Setting lrn_rate = %0.4f " % lrn_rate)

  # GradientBoostingRegressor(*, loss='squared_error',
  # learning_rate=0.1, n_estimators=100, subsample=1.0,
  # criterion='friedman_mse', min_samples_split=2,
  # min_samples_leaf=1, min_weight_fraction_leaf=0.0,
  # max_depth=3, min_impurity_decrease=0.0, init=None,
  # random_state=None, max_features=None, alpha=0.9,
  # verbose=0, max_leaf_nodes=None, warm_start=False,
  # validation_fraction=0.1, n_iter_no_change=None,
  # tol=0.0001, ccp_alpha=0.0)

  model = \
    GradientBoostingRegressor(max_depth=max_depth, 
    min_samples_split=min_samples_split, n_estimators=90,
    learning_rate=lrn_rate, random_state=1)

  print("\nTraining Gradient Boost Regression model ")
  model.fit(train_X, train_y)
  print("Done " )

  print("\nEvaluating model ")
  acc_train = accuracy(model, train_X, train_y, 0.10)
  print("\nAccuracy train (within 0.10): %0.4f " % acc_train)
  acc_test = accuracy(model, test_X, test_y, 0.10)
  print("Accuracy test (within 0.10): %0.4f " % acc_test)

  mse_train = MSE(model, train_X, train_y)
  print("\nMSE train: %0.4f " % mse_train)
  mse_test = MSE(model, test_X, test_y)
  print("MSE test: %0.4f " % mse_test)

  print("\nPredicting for x = ")
  print(train_X[0])
  pred_y = model.predict(train_X[0].reshape(1,-1))
  print("Predicted y = %0.4f " % pred_y)

  print("\nEnd demo ")

if __name__ == "__main__":
  main()

Training data:

# synthetic_train_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402

Test data:

# synthetic_test_40.txt
#
 0.7462,  0.4006, -0.0590,  0.6543, -0.0083,  0.1935
 0.8495, -0.2260, -0.0142, -0.4911,  0.7699,  0.1078
-0.2335, -0.4049,  0.4352, -0.6183, -0.7636,  0.5088
 0.1810, -0.5142,  0.2465,  0.2767, -0.3449,  0.3136
-0.8650,  0.7611, -0.0801,  0.5277, -0.4922,  0.7140
-0.2358, -0.7466, -0.5115, -0.8413, -0.3943,  0.4533
 0.4834,  0.2300,  0.3448, -0.9832,  0.3568,  0.1360
-0.6502, -0.6300,  0.6885,  0.9652,  0.8275,  0.3046
-0.3053,  0.5604,  0.0929,  0.6329, -0.0325,  0.4756
-0.7995,  0.0740, -0.2680,  0.2086,  0.9176,  0.4565
-0.2144, -0.2141,  0.5813,  0.2902, -0.2122,  0.4119
-0.7278, -0.0987, -0.3312, -0.5641,  0.8515,  0.4438
 0.3793,  0.1976,  0.4933,  0.0839,  0.4011,  0.1905
-0.8568,  0.9573, -0.5272,  0.3212, -0.8207,  0.7415
-0.5785,  0.0056, -0.7901, -0.2223,  0.0760,  0.5551
 0.0735, -0.2188,  0.3925,  0.3570,  0.3746,  0.2191
 0.1230, -0.2838,  0.2262,  0.8715,  0.1938,  0.2878
 0.4792, -0.9248,  0.5295,  0.0366, -0.9894,  0.3149
-0.4456,  0.0697,  0.5359, -0.8938,  0.0981,  0.3879
 0.8629, -0.8505, -0.4464,  0.8385,  0.5300,  0.1769
 0.1995,  0.6659,  0.7921,  0.9454,  0.9970,  0.2330
-0.0249, -0.3066, -0.2927, -0.4923,  0.8220,  0.2437
 0.4513, -0.9481, -0.0770, -0.4374, -0.9421,  0.2879
-0.3405,  0.5931, -0.3507, -0.3842,  0.8562,  0.3987
 0.9538,  0.0471,  0.9039,  0.7760,  0.0361,  0.1706
-0.0887,  0.2104,  0.9808,  0.5478, -0.3314,  0.4128
-0.8220, -0.6302,  0.0537, -0.1658,  0.6013,  0.4306
-0.4123, -0.2880,  0.9074, -0.0461, -0.4435,  0.5144
 0.0060,  0.2867, -0.7775,  0.5161,  0.7039,  0.3599
-0.7968, -0.5484,  0.9426, -0.4308,  0.8148,  0.2979
 0.7811,  0.8450, -0.6877,  0.7594,  0.2640,  0.2362
-0.6802, -0.1113, -0.8325, -0.6694, -0.6056,  0.6544
 0.3821,  0.1476,  0.7466, -0.5107,  0.2592,  0.1648
 0.7265,  0.9683, -0.9803, -0.4943, -0.5523,  0.2454
-0.9049, -0.9797, -0.0196, -0.9090, -0.4433,  0.6447
-0.4607,  0.1811, -0.2389,  0.4050, -0.0078,  0.5229
 0.2664, -0.2932, -0.4259, -0.7336,  0.8742,  0.1834
-0.4507,  0.1029, -0.6294, -0.1158, -0.6294,  0.6081
 0.8948, -0.0124,  0.9278,  0.2899, -0.0314,  0.1534
-0.1323, -0.8813, -0.0146, -0.0697,  0.6135,  0.2386
Posted in Machine Learning | Leave a comment

Decision Tree Regression (Without Recursion) From Scratch Using Python

A few days ago, I refactored an old implementation of decision tree regression from scratch using Python. But I wasn’t completely happy with it because it used recursion when building the tree. This is the standard approach, but I don’t like to use recursion because it’s error-prone, difficult to debug, and difficult to modify. So I made new version of a from-scratch Python decision tree regression demo that doesn’t use recursion — instead, the tree building code uses a stack.

For my updated implementation, I used a nested Node class which is defined inside of an outer MyDecisionTreeRegressor class. I prepended the “My” in front of the name so there wouldn’t be a name clash with the scikit-learn DecisionTreeRegressor module:

class MyDecisionTreeRegressor:  # avoid scikit name collision
  # if max_depth = 0, tree has just a root node with avg y
  # if max_depth = 1, tree has at most 3 nodes (root, l, r)
  # if max_depth = n, tree has at most 2^(n+1) - 1 nodes.

  def __init__(self, max_depth=3, min_samples=2,
    n_split_cols=-1, seed=0):
    self.max_depth = max_depth
    self.min_samples = min_samples # aka min_samples_split
    self.n_split_cols = n_split_cols  # mostly random forest
    self.root = None
    self.rnd = np.random.RandomState(seed) # split col order

  # ===============================================

  class Node:
    def __init__(self, id=0, col_idx=-1,
        thresh=0.0, left=None, right=None,
        value=0.0, is_leaf=False):
      self.id = id  # useful for debugging
      self.col_idx = col_idx
      self.thresh = thresh
      self.left = left
      self.right = right
      self.value = value
      self.is_leaf = is_leaf  # False for in-node

    def show(self): . . .

  # ===============================================

  def best_split(self, X, y): . . . 
  def vector_mse(self, y): . . .
  def make_tree(self, X, y, depth=0): . . .
  def fit(self, X, y): . . .
  def predict_one(self, x): . . .
  def predict(self, X): . . .
  def explain(self, x): . . .
  def display(self): . . .

In addition to getting rid of recursion, I changed the split algorithm slightly to minimize weighted mean squared error defined on a single vector (the same thing as variance), instead of maximizing variance reduction, in order to sync with the scikit DecisionTreeRegressor module implementation. Weirdly, in machine learning, MSE compares predicted values with actual target values, but in tree terminology, MSE is equivalent to statistical variance reduction. This is actually a very deep topic, but one for another day.

For my demo, I used a set of synthetic data that I generated using a neural network with random weights and biases. The data looks like:

-0.1660,  0.4406, -0.9998, -0.3953, -0.7065, 0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562, 0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192, 0.8054
. . .

The first five values on each line are the predictors. The sixth value is the target to predict. All predictor values are between -1.0 and 1.0. Normalizing the predictor values is not necessary but is helpful when using the data with other regression techniques that require normalization (such as k-nearest neighbors regression). There are 200 items in the training data and 40 items in the test data.

The output of the from-scratch no-recursion decision tree regression demo program is:

Begin decision tree regression scratch Python

Loading synthetic train (200), test (40) data
Done

First three X predictors:
[[-0.1660  0.4406 -0.9998 -0.3953 -0.7065]
 [ 0.0776 -0.1616  0.3704 -0.5911  0.7562]
 [-0.9452  0.3409 -0.1654  0.1174 -0.7192]]

First three y targets:
0.4840
0.1568
0.8054

Creating tree model max_depth=6 min_samples=2

Accuracy train (within 0.10): 0.8150
Accuracy test (within 0.10): 0.4500

MSE train: 0.0004
MSE test: 0.0054

Predicting train_X[0] using scratch tree:
[0.4909]

IF
column 0  >   -0.2102 AND
column 0  <=   0.3915 AND
column 4  <=  -0.2987 AND
column 0  <=   0.0090 AND
column 4  <=  -0.6966 AND
column 3  >   -0.7393 AND
THEN
predicted = 0.4909

End demo

=================================

Using scikit:

Accuracy train (within 0.10): 0.8100
Accuracy test (within 0.10): 0.4000

MSE train: 0.0004
MSE test: 0.0057

Predicting train_X[0] using scikit:
[0.4909]

The output of my from-scratch Python version and the scikit library version are very close but not exactly the same because both my decision tree regressor version and the scikit version shuffle the order in which columns are searched when finding the split values. This prevents the first few columns from always being used when there are multiple splits that have the same mean squared error. But this introduces minor randomness into the tree models.


Sidebar: Variance Reduction vs. Mean Squared Error

The variance of a set of numbers is the average of the sum of the squared deviations from the mean of the numbers. For example, if v = (2.0, 7.0, 9.0) the mean is 6.0 and the variance is ((2.0 – 6)^2 + (7.0 – 6)^2 + (9.0 – 6)^2) / 3 = (16.0 + 1.0 + 9.0) / 3 = 26.0 / 3 = 8.67.

The mean squared error (MSE) of a set of predicted and target values is the average of the squared deviations between predicted and targets. For example, if some system predicts (4.0, 8.0, 5.0) and the associated three true targets are (6, 9, 7) then the mean squared error is (4.0 – 6)^2 + (8.0 – 9)^2 + (5.0 – 7)^2 / 3 = (4.0 + 1.0 + 4.0) / 3 = 9.0 / 3 = 3.0.

For reasons which aren’t at all clear to me, decision tree terminology, notably scikit documentation, uses the terms MSE (which applies to a set of predicteds and true targets) and variance reduction(which applies to a single set of numbers) interchangeably. The idea here is that variance reduction is the same as MSE if you assume there is a set of hidden true target values, that are all the same, and are equal to the mean of the set of numbers.


I implemented an accuracy() function that scores a predicted y value as correct if it’s within 0.10 of the true value. The accuracy of the decision tree model on the training data is good (81.50% — 163 out of 200 correct) but poor on the test data (45.00% — just 18 out of 40 correct). This is typical for decision trees because they usually overfit on the training data. And this is the reason why decision trees are almost never used by themselves. It’s far more common to combine many decision trees into an ensemble technique such as Bagging Tree, Random Forest, AdaBoost, or Gradient Boost.

For my updated demo, I added an explain() method that shows how an output is computed. Sometimes an explanation helps interpretability, but sometimes it doesn’t really help.

I also added a display() method to show a tree. For example, I set max_depth = 3, and min_samples = 2 and:

Creating tree model max_depth=3 min_samples=2

Tree:
id =    0   0  -0.2102    not_None    not_None   0.0000   False
id =    1   4   0.1431    not_None    not_None   0.0000   False
id =    2   0   0.3915    not_None    not_None   0.0000   False
id =    3   0  -0.4061    not_None    not_None   0.0000   False
id =    4   2   0.1128    not_None    not_None   0.0000   False
id =    5   4  -0.2987    not_None    not_None   0.0000   False
id =    6   1  -0.6099    not_None    not_None   0.0000   False
id =    7  -1   0.0000    None        None       0.6777   True
id =    8  -1   0.0000    None        None       0.4865   True
id =    9  -1   0.0000    None        None       0.4586   True
id =   10  -1   0.0000    None        None       0.3463   True
id =   11  -1   0.0000    None        None       0.4101   True
id =   12  -1   0.0000    None        None       0.2613   True
id =   13  -1   0.0000    None        None       0.1171   True
id =   14  -1   0.0000    None        None       0.1859   True

Because max_depth was set to n = 3, and the tree is full, there are 2^(n+1)-1 = 2^4 – 1 = 16 – 1 = 15 nodes. The fields on each line are ID, split column, split value, left child None or not, right child None or not, node predicted y value, is-leaf-node or not.

Satisfying to get rid of recursion. I feel better now.



I love all kinds of mathematical models. I also love railroad train models, and watch videos of them on the Internet. Some of the models are astonishingly detailed and beautiful.

Top: A screenshot of a video of the Central Florida Railroad Modelers club in Orlando, Florida.

Bottom: A screenshot of a video of the Colorado Model Railroad Museum in Greeley, Colorado (about 50 miles north of Denver).


Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols. (My blog editor chokes on symbols).

# decision_tree_regression_scratch.py

# uses pointers (for nodes) but no recursion (during build)
# uses MSE for splitting (not explicit variance reduction)

import numpy as np

# ===========================================================

class MyDecisionTreeRegressor:  # avoid scikit collision

  # if max_depth = 0, tree has just a root node with avg y
  # if max_depth = 1, tree has at most 3 nodes (root, l, r)
  # if max_depth = n, tree has at most 2^(n+1) - 1 nodes.

  def __init__(self, max_depth=3, min_samples=2,
    n_split_cols=-1, seed=0):
    self.max_depth = max_depth
    self.min_samples = min_samples # aka min_samples_split
    self.n_split_cols = n_split_cols  # for random forest
    self.root = None
    self.rnd = np.random.RandomState(seed)

  # ===============================================

  class Node:
    def __init__(self, id=0, col_idx=-1, thresh=0.0,
        left=None, right=None,
        value=0.0, is_leaf=False):
      self.id = id  # useful for debugging
      self.col_idx = col_idx
      self.thresh = thresh
      self.left = left
      self.right = right
      self.value = value
      self.is_leaf = is_leaf  # False for in-node

    def show(self):
      s1 = "id = %4d " % self.id
      s2 = "%3d " % self.col_idx
      s3 = "%8.4f " % self.thresh
      s4 = "   None     "
      if self.left is not None: s4 = "   not_None "
      s5 = "   None     "
      if self.right is not None: s5 = "   not_None "
      s6 = "%8.4f " % self.value
      s7 = "  " + str(self.is_leaf)
      print(s1+s2+s3+s4+s5+s6+s7)

  # ===============================================

  def best_split(self, X, y):
    # best split using semi MSE
    best_col_idx = -1  # indicates a bad split
    best_thresh = 0.0
    best_mse = np.inf  # smaller is better
    n_rows, n_cols = X.shape
    # n_rows should never be 0

    rnd_cols = np.arange(n_cols)
    self.rnd.shuffle(rnd_cols)
    if self.n_split_cols != -1:  # just use some cols
      rnd_cols = rnd_cols[0:self.n_split_cols]

    for j in range(len(rnd_cols)):
      col_idx = rnd_cols[j]
      examined_threshs = set()
      for i in range(n_rows):
        thresh = X[i][col_idx]  # candidate threshold value

        # if thresh value has already been checked, skip it
        if thresh in examined_threshs == True:
          continue

        examined_threshs.add(thresh)

        # get rows where x is lte, gt thresh

        # a. non-Python way:
        # left_idxs = []
        # right_idxs = []
        # for r in range(len(X)):
        #   if X[r][col_idx] "lte" thresh:
        #    left_idxs.append(r)
        #   else:
        #     right_idxs.append(r)

        # b. Python way WRONG (cannot check len):
        # left_idxs = X[:,col_idx] "lte" thresh
        # right_idxs = ~left_idxs

        # b. Python way
        left_idxs = np.where(X[:,col_idx] "lte" thresh)[0]
        right_idxs = np.where(X[:,col_idx] "gt" thresh)[0]

        # check if proposed split would lead to an empty
        # node, which would cause np.mean() to fail
        # when building the tree.
        # But allow a node with a single value.
        # Equivalent to scikit min_samples_leaf = 1.

        if len(left_idxs) == 0 or \
          len(right_idxs) == 0:
          continue  #  

        # get left and right y values

        # a. non-Python way:
        # left_y_vals = []
        # right_y_vals = []
        # for k in range(len(left_idxs)):
        #   left_y_vals.append(y[left_idxs[k]])
        # for k in range(len(right_idxs)):
        #   right_y_vals.append(y[right_idxs[k]])

        # b. Python way:
        left_y_vals = y[left_idxs]  # not empty
        right_y_vals = y[right_idxs]  # not empty

        # compute proposed split MSE

        # a. use a helper function for MSEs
        mse_left = self.vector_mse(left_y_vals)
        mse_right = self.vector_mse(right_y_vals)

        # b. compute MSEs directly
        # if len(left_y_vals) == 0:
        #   mse_left = 0.0
        # else:
        #   mse_left = \
        #   np.mean((left_y_vals - np.mean(left_y_vals))**2)
        # if len(right_y_vals) == 0:
        #   mse_right = 0.0
        # else:
        #   mse_right = \
        #   np.mean((right_y_vals - np.mean(right_y_vals))**2)

        # overall weighted MSE of proposed split
        split_mse = (len(left_y_vals) * mse_left + \
          len(right_y_vals) * mse_right) / n_rows

        if split_mse "lt" best_mse:
          best_col_idx = col_idx
          best_thresh = thresh
          best_mse = split_mse          

    return best_col_idx, best_thresh  # -1 is bad/no split

  # ---------------------------------------------------------

  def vector_mse(self, y):  # variance but called MSE
    if len(y) == 0: return 0.0  # should never get here
    return np.var(y)

  # ---------------------------------------------------------

  def make_tree(self, X, y):
    root = self.Node()  # is_leaf is False
    stack = [(root, X, y, 0)]  # curr depth = 0

    while (len(stack) "gt" 0):
      curr_node, curr_X, curr_y, curr_depth = stack.pop()

      if curr_depth == self.max_depth or \
        len(curr_y) "lt" self.min_samples:
        curr_node.value = np.mean(curr_y)
        curr_node.is_leaf = True
        continue

      col_idx, thresh = self.best_split(curr_X, curr_y) 

      if col_idx == -1:  # cannot split
        curr_node.value = np.mean(curr_y)
        curr_node.is_leaf = True
        continue
      
      # got a good split so at an internal, non-leaf node
      curr_node.col_idx = col_idx
      curr_node.thresh = thresh

      # create and attach child nodes
      # Python fancy syntax
      # left_idxs = curr_X[:, col_idx] "lte" thresh
      # right_idxs = ~left_idxs  # OK because no need len
      left_idxs = np.where(curr_X[:,col_idx] "lte" thresh)[0]
      right_idxs = np.where(curr_X[:,col_idx] "gt" thresh)[0]

      left_X = curr_X[left_idxs,:]
      left_y = curr_y[left_idxs]
      right_X = curr_X[right_idxs,:]
      right_y = curr_y[right_idxs]

      curr_node.left = self.Node(id=2*curr_node.id+1)
      stack.append((curr_node.left,
        left_X, left_y, curr_depth+1))

      curr_node.right = self.Node(id=2*curr_node.id+2)
      stack.append((curr_node.right,
        right_X, right_y, curr_depth+1))
      
    return root

  # ---------------------------------------------------------

  def fit(self, X, y):
    self.root = self.make_tree(X, y)

  # ---------------------------------------------------------

  def predict_one(self, x):
    curr = self.root
    while curr.is_leaf == False:
      if x[curr.col_idx] "lte" curr.thresh:
        curr = curr.left
      else:
        curr = curr.right
    return curr.value

  def predict(self, X):  # scikit always uses a matrix input
    result = np.zeros(len(X), dtype=np.float64)
    for i in range(len(X)):
      result[i] = self.predict_one(X[i])
    return result

  # ---------------------------------------------------------

  def explain(self, x):
    print("\nIF ");
    curr = self.root;
    while curr.is_leaf == False:
      print("column ", end="")
      print(str(curr.col_idx) + " ", end="")
      if x[curr.col_idx] "lte" curr.thresh:
        print(" "lte" ", end="");
        print("%8.4f AND " %  curr.thresh)
        curr = curr.left
      else:
        print(" "gt"  ", end="")
        print("%8.4f AND " %  curr.thresh)
        curr = curr.right;

    print("THEN \npredicted = %0.4f " % curr.value)

  # ---------------------------------------------------------

  def display(self):
    queue = []
    queue.append(self.root)
    while len(queue) "gt" 0:
      node = queue.pop(0)
      node.show()
      if node.left is not None:
        queue.append(node.left)
      if node.right is not None:
        queue.append(node.right)

# ===========================================================

# -----------------------------------------------------------

def accuracy(model, data_X, data_y, pct_close):
  n = len(data_X)
  n_correct = 0; n_wrong = 0
  for i in range(n):
    x = data_X[i].reshape(1,-1)
    y = data_y[i]
    y_pred = model.predict(x)

    if np.abs(y - y_pred) "lt" np.abs(y * pct_close):
      n_correct += 1
    else: 
      n_wrong += 1
  # print("Correct = " + str(n_correct))
  # print("Wrong   = " + str(n_wrong))
  return n_correct / (n_correct + n_wrong)

# -----------------------------------------------------------

def MSE(model, data_X, data_y):
  n = len(data_X)
  sum = 0.0
  for i in range(n):
    x = data_X[i].reshape(1,-1)
    y = data_y[i]
    y_pred = model.predict(x)
    sum += (y - y_pred) * (y - y_pred)

  return sum / n

# -----------------------------------------------------------

def main():
  print("\nBegin decision tree regression " + \
    "(no recursion) scratch Python ")

  np.set_printoptions(precision=4, suppress=True,
    floatmode='fixed')
  np.random.seed(0)  # not used this version

  # 1. load data
  print("\nLoading synthetic train (200), test (40) data ")
  train_file = ".\\Data\\synthetic_train_200.txt"
  # -0.1660,  0.4406, -0.9998, -0.3953, -0.7065, 0.4840
  #  0.0776, -0.1616,  0.3704, -0.5911,  0.7562, 0.1568
  # -0.9452,  0.3409, -0.1654,  0.1174  -0.7192, 0.8054
  # . . .

  train_X = np.loadtxt(train_file, comments="#",
    usecols=[0,1,2,3,4],
    delimiter=",",  dtype=np.float64)
  train_y = np.loadtxt(train_file, comments="#", usecols=5,
    delimiter=",",  dtype=np.float64)

  test_file = ".\\Data\\synthetic_test_40.txt"
  test_X = np.loadtxt(test_file, comments="#",
    usecols=[0,1,2,3,4],
    delimiter=",",  dtype=np.float64)
  test_y = np.loadtxt(test_file, comments="#", usecols=5,
    delimiter=",",  dtype=np.float64)
  print("Done ")

  print("\nFirst three X predictors: ")
  print(train_X[0:3,:])
  print("\nFirst three y targets: ")
  for i in range(3):
    print("%0.4f" % train_y[i])

  md = 6  # max_depth
  ms = 2  # min_samples to consider a split

  print("\nCreating tree model max_depth=" + str(md) + \
    " min_samples=" + str(ms))
  tree = MyDecisionTreeRegressor(max_depth=md, 
    min_samples=ms, seed=0)
  tree.fit(train_X, train_y)

  # print("\nTree: ")
  # tree.display()

  acc_train = accuracy(tree, train_X, train_y, 0.10)
  print("\nAccuracy train (within 0.10): %0.4f " % acc_train)
  acc_test = accuracy(tree, test_X, test_y, 0.10)
  print("Accuracy test (within 0.10): %0.4f " % acc_test)

  mse_train = MSE(tree, train_X, train_y)
  print("\nMSE train: %0.4f " % mse_train)
  mse_test = MSE(tree, test_X, test_y)
  print("MSE test: %0.4f " % mse_test)

  print("\nPredicting train_X[0] using scratch tree: ")
  x = train_X[0].reshape(1,-1)
  y_pred = tree.predict(x)  # predict() wants a matrix
  print(y_pred)

  tree.explain(train_X[0])  # explain() wants a vector

  print("\nEnd demo ")

  print("\n================================= ")

  print("\nUsing scikit: ")
  from sklearn.tree import DecisionTreeRegressor
  sci_tree = DecisionTreeRegressor(max_depth=md,
    min_samples_split=ms, random_state=0)
  sci_tree.fit(train_X, train_y)

  acc_train = accuracy(sci_tree, train_X, train_y, 0.10)
  print("\nAccuracy train (within 0.10): %0.4f " % acc_train)
  acc_test = accuracy(sci_tree, test_X, test_y, 0.10)
  print("Accuracy test (within 0.10): %0.4f " % acc_test)

  mse_train = MSE(sci_tree, train_X, train_y)
  print("\nMSE train: %0.4f " % mse_train)
  mse_test = MSE(sci_tree, test_X, test_y)
  print("MSE test: %0.4f " % mse_test)

  print("\nPredicting train_X[0] using scikit: ")
  y_pred = sci_tree.predict(x)
  print(y_pred)

if __name__ == "__main__":
  main()

Training data:

# synthetic_train_200.txt
#
-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
-0.8299, -0.9219, -0.6603,  0.7563, -0.8033,  0.7955
 0.0663,  0.3838, -0.3690,  0.3730,  0.6693,  0.3206
-0.9634,  0.5003,  0.9777,  0.4963, -0.4391,  0.7377
-0.1042,  0.8172, -0.4128, -0.4244, -0.7399,  0.4801
-0.9613,  0.3577, -0.5767, -0.4689, -0.0169,  0.6861
-0.7065,  0.1786,  0.3995, -0.7953, -0.1719,  0.5569
 0.3888, -0.1716, -0.9001,  0.0718,  0.3276,  0.2500
 0.1731,  0.8068, -0.7251, -0.7214,  0.6148,  0.3297
-0.2046, -0.6693,  0.8550, -0.3045,  0.5016,  0.2129
 0.2473,  0.5019, -0.3022, -0.4601,  0.7918,  0.2613
-0.1438,  0.9297,  0.3269,  0.2434, -0.7705,  0.5171
 0.1568, -0.1837, -0.5259,  0.8068,  0.1474,  0.3307
-0.9943,  0.2343, -0.3467,  0.0541,  0.7719,  0.5581
 0.2467, -0.9684,  0.8589,  0.3818,  0.9946,  0.1092
-0.6553, -0.7257,  0.8652,  0.3936, -0.8680,  0.7018
 0.8460,  0.4230, -0.7515, -0.9602, -0.9476,  0.1996
-0.9434, -0.5076,  0.7201,  0.0777,  0.1056,  0.5664
 0.9392,  0.1221, -0.9627,  0.6013, -0.5341,  0.1533
 0.6142, -0.2243,  0.7271,  0.4942,  0.1125,  0.1661
 0.4260,  0.1194, -0.9749, -0.8561,  0.9346,  0.2230
 0.1362, -0.5934, -0.4953,  0.4877, -0.6091,  0.3810
 0.6937, -0.5203, -0.0125,  0.2399,  0.6580,  0.1460
-0.6864, -0.9628, -0.8600, -0.0273,  0.2127,  0.5387
 0.9772,  0.1595, -0.2397,  0.1019,  0.4907,  0.1611
 0.3385, -0.4702, -0.8673, -0.2598,  0.2594,  0.2270
-0.8669, -0.4794,  0.6095, -0.6131,  0.2789,  0.4700
 0.0493,  0.8496, -0.4734, -0.8681,  0.4701,  0.3516
 0.8639, -0.9721, -0.5313,  0.2336,  0.8980,  0.1412
 0.9004,  0.1133,  0.8312,  0.2831, -0.2200,  0.1782
 0.0991,  0.8524,  0.8375, -0.2102,  0.9265,  0.2150
-0.6521, -0.7473, -0.7298,  0.0113, -0.9570,  0.7422
 0.6190, -0.3105,  0.8802,  0.1640,  0.7577,  0.1056
 0.6895,  0.8108, -0.0802,  0.0927,  0.5972,  0.2214
 0.1982, -0.9689,  0.1870, -0.1326,  0.6147,  0.1310
-0.3695,  0.7858,  0.1557, -0.6320,  0.5759,  0.3773
-0.1596,  0.3581,  0.8372, -0.9992,  0.9535,  0.2071
-0.2468,  0.9476,  0.2094,  0.6577,  0.1494,  0.4132
 0.1737,  0.5000,  0.7166,  0.5102,  0.3961,  0.2611
 0.7290, -0.3546,  0.3416, -0.0983, -0.2358,  0.1332
-0.3652,  0.2438, -0.1395,  0.9476,  0.3556,  0.4170
-0.6029, -0.1466, -0.3133,  0.5953,  0.7600,  0.4334
-0.4596, -0.4953,  0.7098,  0.0554,  0.6043,  0.2775
 0.1450,  0.4663,  0.0380,  0.5418,  0.1377,  0.2931
-0.8636, -0.2442, -0.8407,  0.9656, -0.6368,  0.7429
 0.6237,  0.7499,  0.3768,  0.1390, -0.6781,  0.2185
-0.5499,  0.1850, -0.3755,  0.8326,  0.8193,  0.4399
-0.4858, -0.7782, -0.6141, -0.0008,  0.4572,  0.4197
 0.7033, -0.1683,  0.2334, -0.5327, -0.7961,  0.1776
 0.0317, -0.0457, -0.6947,  0.2436,  0.0880,  0.3345
 0.5031, -0.5559,  0.0387,  0.5706, -0.9553,  0.3107
-0.3513,  0.7458,  0.6894,  0.0769,  0.7332,  0.3170
 0.2205,  0.5992, -0.9309,  0.5405,  0.4635,  0.3532
-0.4806, -0.4859,  0.2646, -0.3094,  0.5932,  0.3202
 0.9809, -0.3995, -0.7140,  0.8026,  0.0831,  0.1600
 0.9495,  0.2732,  0.9878,  0.0921,  0.0529,  0.1289
-0.9476, -0.6792,  0.4913, -0.9392, -0.2669,  0.5966
 0.7247,  0.3854,  0.3819, -0.6227, -0.1162,  0.1550
-0.5922, -0.5045, -0.4757,  0.5003, -0.0860,  0.5863
-0.8861,  0.0170, -0.5761,  0.5972, -0.4053,  0.7301
 0.6877, -0.2380,  0.4997,  0.0223,  0.0819,  0.1404
 0.9189,  0.6079, -0.9354,  0.4188, -0.0700,  0.1907
-0.1428, -0.7820,  0.2676,  0.6059,  0.3936,  0.2790
 0.5324, -0.3151,  0.6917, -0.1425,  0.6480,  0.1071
-0.8432, -0.9633, -0.8666, -0.0828, -0.7733,  0.7784
-0.9444,  0.5097, -0.2103,  0.4939, -0.0952,  0.6787
-0.0520,  0.6063, -0.1952,  0.8094, -0.9259,  0.4836
 0.5477, -0.7487,  0.2370, -0.9793,  0.0773,  0.1241
 0.2450,  0.8116,  0.9799,  0.4222,  0.4636,  0.2355
 0.8186, -0.1983, -0.5003, -0.6531, -0.7611,  0.1511
-0.4714,  0.6382, -0.3788,  0.9648, -0.4667,  0.5950
 0.0673, -0.3711,  0.8215, -0.2669, -0.1328,  0.2677
-0.9381,  0.4338,  0.7820, -0.9454,  0.0441,  0.5518
-0.3480,  0.7190,  0.1170,  0.3805, -0.0943,  0.4724
-0.9813,  0.1535, -0.3771,  0.0345,  0.8328,  0.5438
-0.1471, -0.5052, -0.2574,  0.8637,  0.8737,  0.3042
-0.5454, -0.3712, -0.6505,  0.2142, -0.1728,  0.5783
 0.6327, -0.6297,  0.4038, -0.5193,  0.1484,  0.1153
-0.5424,  0.3282, -0.0055,  0.0380, -0.6506,  0.6613
 0.1414,  0.9935,  0.6337,  0.1887,  0.9520,  0.2540
-0.9351, -0.8128, -0.8693, -0.0965, -0.2491,  0.7353
 0.9507, -0.6640,  0.9456,  0.5349,  0.6485,  0.1059
-0.0462, -0.9737, -0.2940, -0.0159,  0.4602,  0.2606
-0.0627, -0.0852, -0.7247, -0.9782,  0.5166,  0.2977
 0.0478,  0.5098, -0.0723, -0.7504, -0.3750,  0.3335
 0.0090,  0.3477,  0.5403, -0.7393, -0.9542,  0.4415
-0.9748,  0.3449,  0.3736, -0.1015,  0.8296,  0.4358
 0.2887, -0.9895, -0.0311,  0.7186,  0.6608,  0.2057
 0.1570, -0.4518,  0.1211,  0.3435, -0.2951,  0.3244
 0.7117, -0.6099,  0.4946, -0.4208,  0.5476,  0.1096
-0.2929, -0.5726,  0.5346, -0.3827,  0.4665,  0.2465
 0.4889, -0.5572, -0.5718, -0.6021, -0.7150,  0.2163
-0.7782,  0.3491,  0.5996, -0.8389, -0.5366,  0.6516
-0.5847,  0.8347,  0.4226,  0.1078, -0.3910,  0.6134
 0.8469,  0.4121, -0.0439, -0.7476,  0.9521,  0.1571
-0.6803, -0.5948, -0.1376, -0.1916, -0.7065,  0.7156
 0.2878,  0.5086, -0.5785,  0.2019,  0.4979,  0.2980
 0.2764,  0.1943, -0.4090,  0.4632,  0.8906,  0.2960
-0.8877,  0.6705, -0.6155, -0.2098, -0.3998,  0.7107
-0.8398,  0.8093, -0.2597,  0.0614, -0.0118,  0.6502
-0.8476,  0.0158, -0.4769, -0.2859, -0.7839,  0.7715
 0.5751, -0.7868,  0.9714, -0.6457,  0.1448,  0.1175
 0.4802, -0.7001,  0.1022, -0.5668,  0.5184,  0.1090
 0.4458, -0.6469,  0.7239, -0.9604,  0.7205,  0.0779
 0.5175,  0.4339,  0.9747, -0.4438, -0.9924,  0.2879
 0.8678,  0.7158,  0.4577,  0.0334,  0.4139,  0.1678
 0.5406,  0.5012,  0.2264, -0.1963,  0.3946,  0.2088
-0.9938,  0.5498,  0.7928, -0.5214, -0.7585,  0.7687
 0.7661,  0.0863, -0.4266, -0.7233, -0.4197,  0.1466
 0.2277, -0.3517, -0.0853, -0.1118,  0.6563,  0.1767
 0.3499, -0.5570, -0.0655, -0.3705,  0.2537,  0.1632
 0.7547, -0.1046,  0.5689, -0.0861,  0.3125,  0.1257
 0.8186,  0.2110,  0.5335,  0.0094, -0.0039,  0.1391
 0.6858, -0.8644,  0.1465,  0.8855,  0.0357,  0.1845
-0.4967,  0.4015,  0.0805,  0.8977,  0.2487,  0.4663
 0.6760, -0.9841,  0.9787, -0.8446, -0.3557,  0.1509
-0.1203, -0.4885,  0.6054, -0.0443, -0.7313,  0.4854
 0.8557,  0.7919, -0.0169,  0.7134, -0.1628,  0.2002
 0.0115, -0.6209,  0.9300, -0.4116, -0.7931,  0.4052
-0.7114, -0.9718,  0.4319,  0.1290,  0.5892,  0.3661
 0.3915,  0.5557, -0.1870,  0.2955, -0.6404,  0.2954
-0.3564, -0.6548, -0.1827, -0.5172, -0.1862,  0.4622
 0.2392, -0.4959,  0.5857, -0.1341, -0.2850,  0.2470
-0.3394,  0.3947, -0.4627,  0.6166, -0.4094,  0.5325
 0.7107,  0.7768, -0.6312,  0.1707,  0.7964,  0.2757
-0.1078,  0.8437, -0.4420,  0.2177,  0.3649,  0.4028
-0.3139,  0.5595, -0.6505, -0.3161, -0.7108,  0.5546
 0.4335,  0.3986,  0.3770, -0.4932,  0.3847,  0.1810
-0.2562, -0.2894, -0.8847,  0.2633,  0.4146,  0.4036
 0.2272,  0.2966, -0.6601, -0.7011,  0.0284,  0.2778
-0.0743, -0.1421, -0.0054, -0.6770, -0.3151,  0.3597
-0.4762,  0.6891,  0.6007, -0.1467,  0.2140,  0.4266
-0.4061,  0.7193,  0.3432,  0.2669, -0.7505,  0.6147
-0.0588,  0.9731,  0.8966,  0.2902, -0.6966,  0.4955
-0.0627, -0.1439,  0.1985,  0.6999,  0.5022,  0.3077
 0.1587,  0.8494, -0.8705,  0.9827, -0.8940,  0.4263
-0.7850,  0.2473, -0.9040, -0.4308, -0.8779,  0.7199
 0.4070,  0.3369, -0.2428, -0.6236,  0.4940,  0.2215
-0.0242,  0.0513, -0.9430,  0.2885, -0.2987,  0.3947
-0.5416, -0.1322, -0.2351, -0.0604,  0.9590,  0.3683
 0.1055,  0.7783, -0.2901, -0.5090,  0.8220,  0.2984
-0.9129,  0.9015,  0.1128, -0.2473,  0.9901,  0.4776
-0.9378,  0.1424, -0.6391,  0.2619,  0.9618,  0.5368
 0.7498, -0.0963,  0.4169,  0.5549, -0.0103,  0.1614
-0.2612, -0.7156,  0.4538, -0.0460, -0.1022,  0.3717
 0.7720,  0.0552, -0.1818, -0.4622, -0.8560,  0.1685
-0.4177,  0.0070,  0.9319, -0.7812,  0.3461,  0.3052
-0.0001,  0.5542, -0.7128, -0.8336, -0.2016,  0.3803
 0.5356, -0.4194, -0.5662, -0.9666, -0.2027,  0.1776
-0.2378,  0.3187, -0.8582, -0.6948, -0.9668,  0.5474
-0.1947, -0.3579,  0.1158,  0.9869,  0.6690,  0.2992
 0.3992,  0.8365, -0.9205, -0.8593, -0.0520,  0.3154
-0.0209,  0.0793,  0.7905, -0.1067,  0.7541,  0.1864
-0.4928, -0.4524, -0.3433,  0.0951, -0.5597,  0.6261
-0.8118,  0.7404, -0.5263, -0.2280,  0.1431,  0.6349
 0.0516, -0.8480,  0.7483,  0.9023,  0.6250,  0.1959
-0.3212,  0.1093,  0.9488, -0.3766,  0.3376,  0.2735
-0.3481,  0.5490, -0.3484,  0.7797,  0.5034,  0.4379
-0.5785, -0.9170, -0.3563, -0.9258,  0.3877,  0.4121
 0.3407, -0.1391,  0.5356,  0.0720, -0.9203,  0.3458
-0.3287, -0.8954,  0.2102,  0.0241,  0.2349,  0.3247
-0.1353,  0.6954, -0.0919, -0.9692,  0.7461,  0.3338
 0.9036, -0.8982, -0.5299, -0.8733, -0.1567,  0.1187
 0.7277, -0.8368, -0.0538, -0.7489,  0.5458,  0.0830
 0.9049,  0.8878,  0.2279,  0.9470, -0.3103,  0.2194
 0.7957, -0.1308, -0.5284,  0.8817,  0.3684,  0.2172
 0.4647, -0.4931,  0.2010,  0.6292, -0.8918,  0.3371
-0.7390,  0.6849,  0.2367,  0.0626, -0.5034,  0.7039
-0.1567, -0.8711,  0.7940, -0.5932,  0.6525,  0.1710
 0.7635, -0.0265,  0.1969,  0.0545,  0.2496,  0.1445
 0.7675,  0.1354, -0.7698, -0.5460,  0.1920,  0.1728
-0.5211, -0.7372, -0.6763,  0.6897,  0.2044,  0.5217
 0.1913,  0.1980,  0.2314, -0.8816,  0.5006,  0.1998
 0.8964,  0.0694, -0.6149,  0.5059, -0.9854,  0.1825
 0.1767,  0.7104,  0.2093,  0.6452,  0.7590,  0.2832
-0.3580, -0.7541,  0.4426, -0.1193, -0.7465,  0.5657
-0.5996,  0.5766, -0.9758, -0.3933, -0.9572,  0.6800
 0.9950,  0.1641, -0.4132,  0.8579,  0.0142,  0.2003
-0.4717, -0.3894, -0.2567, -0.5111,  0.1691,  0.4266
 0.3917, -0.8561,  0.9422,  0.5061,  0.6123,  0.1212
-0.0366, -0.1087,  0.3449, -0.1025,  0.4086,  0.2475
 0.3633,  0.3943,  0.2372, -0.6980,  0.5216,  0.1925
-0.5325, -0.6466, -0.2178, -0.3589,  0.6310,  0.3568
 0.2271,  0.5200, -0.1447, -0.8011, -0.7699,  0.3128
 0.6415,  0.1993,  0.3777, -0.0178, -0.8237,  0.2181
-0.5298, -0.0768, -0.6028, -0.9490,  0.4588,  0.4356
 0.6870, -0.1431,  0.7294,  0.3141,  0.1621,  0.1632
-0.5985,  0.0591,  0.7889, -0.3900,  0.7419,  0.2945
 0.3661,  0.7984, -0.8486,  0.7572, -0.6183,  0.3449
 0.6995,  0.3342, -0.3113, -0.6972,  0.2707,  0.1712
 0.2565,  0.9126,  0.1798, -0.6043, -0.1413,  0.2893
-0.3265,  0.9839, -0.2395,  0.9854,  0.0376,  0.4770
 0.2690, -0.1722,  0.9818,  0.8599, -0.7015,  0.3954
-0.2102, -0.0768,  0.1219,  0.5607, -0.0256,  0.3949
 0.8216, -0.9555,  0.6422, -0.6231,  0.3715,  0.0801
-0.2896,  0.9484, -0.7545, -0.6249,  0.7789,  0.4370
-0.9985, -0.5448, -0.7092, -0.5931,  0.7926,  0.5402

Test data:

# synthetic_test_40.txt
#
 0.7462,  0.4006, -0.0590,  0.6543, -0.0083,  0.1935
 0.8495, -0.2260, -0.0142, -0.4911,  0.7699,  0.1078
-0.2335, -0.4049,  0.4352, -0.6183, -0.7636,  0.5088
 0.1810, -0.5142,  0.2465,  0.2767, -0.3449,  0.3136
-0.8650,  0.7611, -0.0801,  0.5277, -0.4922,  0.7140
-0.2358, -0.7466, -0.5115, -0.8413, -0.3943,  0.4533
 0.4834,  0.2300,  0.3448, -0.9832,  0.3568,  0.1360
-0.6502, -0.6300,  0.6885,  0.9652,  0.8275,  0.3046
-0.3053,  0.5604,  0.0929,  0.6329, -0.0325,  0.4756
-0.7995,  0.0740, -0.2680,  0.2086,  0.9176,  0.4565
-0.2144, -0.2141,  0.5813,  0.2902, -0.2122,  0.4119
-0.7278, -0.0987, -0.3312, -0.5641,  0.8515,  0.4438
 0.3793,  0.1976,  0.4933,  0.0839,  0.4011,  0.1905
-0.8568,  0.9573, -0.5272,  0.3212, -0.8207,  0.7415
-0.5785,  0.0056, -0.7901, -0.2223,  0.0760,  0.5551
 0.0735, -0.2188,  0.3925,  0.3570,  0.3746,  0.2191
 0.1230, -0.2838,  0.2262,  0.8715,  0.1938,  0.2878
 0.4792, -0.9248,  0.5295,  0.0366, -0.9894,  0.3149
-0.4456,  0.0697,  0.5359, -0.8938,  0.0981,  0.3879
 0.8629, -0.8505, -0.4464,  0.8385,  0.5300,  0.1769
 0.1995,  0.6659,  0.7921,  0.9454,  0.9970,  0.2330
-0.0249, -0.3066, -0.2927, -0.4923,  0.8220,  0.2437
 0.4513, -0.9481, -0.0770, -0.4374, -0.9421,  0.2879
-0.3405,  0.5931, -0.3507, -0.3842,  0.8562,  0.3987
 0.9538,  0.0471,  0.9039,  0.7760,  0.0361,  0.1706
-0.0887,  0.2104,  0.9808,  0.5478, -0.3314,  0.4128
-0.8220, -0.6302,  0.0537, -0.1658,  0.6013,  0.4306
-0.4123, -0.2880,  0.9074, -0.0461, -0.4435,  0.5144
 0.0060,  0.2867, -0.7775,  0.5161,  0.7039,  0.3599
-0.7968, -0.5484,  0.9426, -0.4308,  0.8148,  0.2979
 0.7811,  0.8450, -0.6877,  0.7594,  0.2640,  0.2362
-0.6802, -0.1113, -0.8325, -0.6694, -0.6056,  0.6544
 0.3821,  0.1476,  0.7466, -0.5107,  0.2592,  0.1648
 0.7265,  0.9683, -0.9803, -0.4943, -0.5523,  0.2454
-0.9049, -0.9797, -0.0196, -0.9090, -0.4433,  0.6447
-0.4607,  0.1811, -0.2389,  0.4050, -0.0078,  0.5229
 0.2664, -0.2932, -0.4259, -0.7336,  0.8742,  0.1834
-0.4507,  0.1029, -0.6294, -0.1158, -0.6294,  0.6081
 0.8948, -0.0124,  0.9278,  0.2899, -0.0314,  0.1534
-0.1323, -0.8813, -0.0146, -0.0697,  0.6135,  0.2386
Posted in Machine Learning | Leave a comment