A Canonical Example of Machine Learning Regression Using C#

I’ve been working on a book tentatively titled “Machine Learning Regression Using Classical Techniques With C#”. I intend to cover five classical (non-tree-based) regression techniques: linear regression, nearest neighbors regression, quadratic regression, kernel ridge regression, and neural network regression.

Note: I’m explicitly ignoring support vector regression which, in my opinion, is a solution-in-search-of-a-problem intellectual abomination. But that’s another story.

For my introductory chapter, I want to present a canonical example of regression. I scanned my brain for the simplest possible regression technique to use as the example. I have decades of experience with machine learning and I figured there was some obscure, super-simple regression technique, but a technique that illustrates all, or at least most, of the key ideas of machine learning regression using the C# language.

I was surprised when I couldn’t find anything simpler than standard linear regression or nearest neighbors regression. So, I put together what I consider a canonical demo using linear regression.

The demo uses training data that has known correct input predictors (sometimes called features) and target y values. The 12-item raw training data is:

age income   debt      balance
-------------------------------
24,  medium, $2325.00, $1972.00
33,  high,   $3140.00, $2324.00
44,  low,    $1425.00, $2284.00
38,  medium, $3000.00, $2606.00
41,  high,   $2833.00, $3193.00
57,  high,   $3875.00, $2709.00
25,  low,    $1974.00, $2109.00
29,  high,   $3900.00, $3209.00
57,  medium, $3642.00, $2799.00
64,  low,    $1525.00, $2708.00
26,  high,   $3625.00, $2406.00
38,  high,   $3400.00, $2634.00

The goal is to predict bank account balance from age, income and debt.

The numeric predictors (age, debt) are normalized so they’re all between 0.0 and 1.0. The categorical predictor (income) is encoded to equal-interval numeric values. The 12-item dataset is randomly split into an 8-item training set and a 4-item test set:

age   income  debt  balance
---------------------------
0.24, 0.50, 0.2325, 0.1972
0.57, 0.75, 0.3875, 0.2709
0.38, 0.50, 0.3000, 0.2606
0.29, 0.75, 0.3900, 0.3209
0.44, 0.25, 0.1425, 0.2284
0.64, 0.25, 0.1525, 0.2708
0.26, 0.75, 0.3625, 0.2406
0.38, 0.75, 0.3400, 0.2634
age   income  debt  balance
---------------------------
0.33, 0.75, 0.3140, 0.2324
0.41, 0.75, 0.2833, 0.3193
0.25, 0.25, 0.1974, 0.2109
0.57, 0.50, 0.3642, 0.2799

The output of the demo regression program is:

Begin predict bank balance from age, income, debt

Setting up train and test data
Done

First two train X:
   0.2400   0.5000   0.2325
   0.5700   0.7500   0.3875

First two train y:
  0.1972
  0.2709

Creating linear regression model
Done

Setting lrnRate = 0.01, maxEpochs = 100
Starting training using SGD
epoch =     0  MSE =   0.0011
epoch =   200  MSE =   0.0007
epoch =   400  MSE =   0.0007
epoch =   600  MSE =   0.0007
epoch =   800  MSE =   0.0007
Done

Weights/coefficients:
0.0999  -0.0708  0.3717
Bias/constant: 0.1487

Evaluating model
Accuracy train (within 0.10) = 0.8750
Accuracy test (within 0.10) = 0.7500

Predicting for 35  medium  $19,000
predicted y = $2188.64

End demo

The demo output illustrates nine key ideas for machine learning regression using C#. They are: 1.) vectors and matrices, 2.) training and test data, 3.) data normalization and encoding, 4.) different regression techniques, 5.) different training algorithms, 6.) model and training hyperparameters, 7.) model interpretability, 8.) model evaluation, and 9.) using a trained regression model to make a prediction.

A good thought experiment.



I find great beauty in mathematics and machine learning. I’m not so good at recognizing beauty in Art. But here are three photographs by Valeria Lokinskaya that combine math and art that I think are beautiful. (I’m not sure if the Asian characters in the middle photo are numbers are not, but they look nice to me).


Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols.

using System;
using System.IO;

namespace IntroductionToRegression
{
  internal class IntroductionToRegressionProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin predict bank balance" +
        " from age, income, debt ");

      Console.WriteLine("\nSetting up train and test data ");
      double[][] trainX = new double[8][];
      trainX[0] = new double[] { 0.24, 0.50, 0.2325 };
      trainX[1] = new double[] { 0.57, 0.75, 0.3875 };
      trainX[2] = new double[] { 0.38, 0.50, 0.3000 };
      trainX[3] = new double[] { 0.29, 0.75, 0.3900 };
      trainX[4] = new double[] { 0.44, 0.25, 0.1425 };
      trainX[5] = new double[] { 0.64, 0.25, 0.1525 };
      trainX[6] = new double[] { 0.26, 0.75, 0.3625 };
      trainX[7] = new double[] { 0.38, 0.75, 0.3400 };

      double[] trainY = new double[] { 0.1972, 0.2709,
        0.2606, 0.3209, 0.2284, 0.2708, 0.2406, 0.2634 };

      double[][] testX = new double[4][];
      testX[0] = new double[] { 0.33, 0.75, 0.3140 };
      testX[1] = new double[] { 0.41, 0.75, 0.2833 };
      testX[2] = new double[] { 0.25, 0.25, 0.1974 };
      testX[3] = new double[] { 0.57, 0.50, 0.3642 };

      double[] testY = new double[] { 0.2324, 0.3193,
        0.2109, 0.2799 };
      Console.WriteLine("Done ");

      Console.WriteLine("\nFirst two train X: ");
      for (int i = 0; i "lt" 2; ++i)
      {
        for (int j = 0; j "lt" trainX[0].Length; ++j)
          Console.Write(trainX[i][j].ToString("F4").
            PadLeft(9));
        Console.WriteLine("");
      }

      Console.WriteLine("\nFirst two train y: ");
      for (int i = 0; i "lt" 2; ++i)
        Console.WriteLine(trainY[i].ToString("F4").
          PadLeft(8));

      Console.WriteLine("\nCreating linear regression model ");
      LinearRegressor model = new LinearRegressor(seed: 0);
      Console.WriteLine("Done ");

      Console.WriteLine("\nSetting lrnRate = 0.01," +
        " maxEpochs = 100 ");
      Console.WriteLine("Starting training using SGD ");
      model.TrainSGD(trainX, trainY, 0.20, 1000);
      Console.WriteLine("Done ");

      Console.WriteLine("\nWeights/coefficients: ");
      for (int i = 0; i "lt" model.weights.Length; ++i)
        Console.Write(model.weights[i].ToString("F4") + "  ");
      Console.WriteLine("\nBias/constant: " +
        model.bias.ToString("F4"));

      Console.WriteLine("\nEvaluating model ");
      double accTrain = model.Accuracy(trainX, trainY, 0.15);
      Console.WriteLine("Accuracy train (within 0.15) = " +
        accTrain.ToString("F4"));
      double accTest = model.Accuracy(testX, testY, 0.15);
      Console.WriteLine("Accuracy test (within 0.15) = " +
        accTest.ToString("F4"));

      Console.WriteLine("\nPredicting for 35" +
        "  medium  $19,000 ");
      double[] x = new double[] { 0.35, 0.50, 0.19000 };
      double predY = model.Predict(x);
      Console.WriteLine("predicted y = $" + 
        (predY * 10000).ToString("F2"));

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();

    } // Main

    public class LinearRegressor
    {
      public double[] weights;
      public double bias;
      private Random rnd;

      public LinearRegressor(int seed = 0)
      {
        this.weights = new double[0]; // quasi-null
        this.bias = 0.0;
        this.rnd = new Random(seed);
      }

      // ------------------------------------------------------

      public void TrainSGD(double[][] trainX, double[] trainY,
        double lrnRate, int maxEpochs)
      {
        int n = trainX.Length;
        int dim = trainX[0].Length;
        this.weights = new double[dim];
        double lo = -0.01; double hi = 0.01;
        for (int i = 0; i "lt" dim; ++i)
          this.weights[i] = (hi - lo) *
            this.rnd.NextDouble() + lo;
        this.bias = (hi - lo) *
            this.rnd.NextDouble() + lo;

        int[] indices = new int[n];
        for (int i = 0; i "lt" n; ++i)
          indices[i] = i;

        for (int epoch = 0; epoch "lt" maxEpochs; ++epoch)
        {
          // shuffle indices
          for (int i = 0; i "lt" n; ++i)
          {
            int ri = rnd.Next(i, n);
            int tmp = indices[i];
            indices[i] = indices[ri];
            indices[ri] = tmp;
          }
          for (int i = 0; i "lt" n; ++i)
          {
            int ii = indices[i];
            double[] x = trainX[ii];
            double actualY = trainY[ii];
            double predY = this.Predict(x);

            for (int j = 0; j "lt" dim; ++j)
              this.weights[j] -= lrnRate *
                (predY - actualY) * x[j];
            this.bias -= lrnRate * (predY - actualY);
          }
          if (epoch % (int)(maxEpochs / 5) == 0)
          {
            double mse = this.MSE(trainX, trainY);
            string s1 = "epoch = " + epoch.ToString().
              PadLeft(5);
            string s2 = "  MSE = " + mse.ToString("F4").
              PadLeft(8);
            Console.WriteLine(s1 + s2);
          }
        }
      }

      // ------------------------------------------------------

      public double Predict(double[] x)
      {
        double result = 0.0;
        for (int j = 0; j "lt" x.Length; ++j)
          result += x[j] * this.weights[j];
        result += this.bias;
        return result;
      }

      // ------------------------------------------------------

      public double Accuracy(double[][] dataX, double[] dataY,
        double pctClose)
      {
        int numCorrect = 0; int numWrong = 0;
        for (int i = 0; i "lt" dataX.Length; ++i)
        {
          double actualY = dataY[i];
          double predY = this.Predict(dataX[i]);
          if (Math.Abs(predY - actualY) "lt" 
            Math.Abs(pctClose * actualY))
          {
            ++numCorrect;
          }
          else
          {
            ++numWrong;
          }
        }
        return (numCorrect * 1.0) / (numWrong + numCorrect);
      }

      // ------------------------------------------------------

      public double MSE(double[][] dataX, double[] dataY)
      {
        int n = dataX.Length;
        double sum = 0.0;
        for (int i = 0; i "lt" n; ++i)
        {
          double actualY = dataY[i];
          double predY = this.Predict(dataX[i]);
          sum += (actualY - predY) * (actualY - predY);
        }
        return sum / n;
      }

      // ------------------------------------------------------

    } // class LinearRegressor
  } // Program
} // ns
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply