Exploring Linear Ridge Regression Using C#

A simple linear regression problem predicts a numeric y value from a single numeric x value. For example, you might want to predict a person’s weight from their height. A multiple linear regression problem predicts a numeric y value from two or more numeric x values. For example, you might want to predict y = weight from x1 = height, x2 = daily calories, x3 = daily exercise. The term linear regression typically means multiple linear regression.

Linear ridge regression (LRR) is a minor variation of standard linear regression. Briefly, LRR adds noise to the diagonal of the behind-the-scenes matrix which 1.) prevents matrix inverse from failing when two or more columns are linearly correlated, and 2.) introduces L2 regularization which deters models overfitting.

The prediction equation is y = b0 + b1*x1 + b2*x2 + . . . The bi values are usually called the coefficients or weights. The b0 term is sometimes called the constant, or intercept, or bias.

If you have a set of training data, it is possible to compute the values of the coefficients exactly (so that the sum of the squared errors is minimized) using matrix algebra. This is called a closed-form solution. It is also possible to estimate the values of the coefficients using an iterated gradient descent approach.

I have been working with C# matrix functions lately, so one morning I decided to put together a demo of linear regression using C# matrix algebra. I created a 40-item synthetic data with three predictors and a target y.

 x1      x2       x2      y
--------------------------------
-0.1660, 0.4406, -0.9998, 0.4487
-0.8153, -0.6275, -0.3089, 0.2757
-0.2065, 0.0776, -0.1616, 0.3096
0.7562, -0.9452, 0.3409, 0.2839
-0.1654, 0.1174, -0.7192, 0.4079
0.9365, -0.3732, 0.3846, 0.2787
0.7528, 0.7892, -0.8299, 0.4489
0.0663, 0.3838, -0.3690, 0.3740
0.3730, 0.6693, -0.9634, 0.4525
0.5786, -0.7935, -0.1042, 0.3348
0.8172, -0.4128, -0.4244, 0.3928
-0.4689, -0.0169, -0.8933, 0.4036
0.1482, -0.7065, 0.1786, 0.2809
-0.1719, 0.3888, -0.1716, 0.3385
-0.9001, 0.0718, 0.3276, 0.2309
0.1731, 0.8068, -0.7251, 0.4236
-0.7214, 0.6148, -0.2046, 0.3311
0.4520, 0.7666, 0.2473, 0.3204
0.5019, -0.3022, -0.4601, 0.3885
0.9297, 0.3269, 0.2434, 0.3353
-0.7705, 0.8990, -0.1002, 0.2941
-0.5259, 0.8068, 0.1474, 0.3022
-0.9943, 0.2343, -0.3467, 0.3144
-0.2855, 0.8171, 0.2467, 0.2934
-0.9684, 0.8589, 0.3818, 0.2229
0.5109, 0.5078, 0.8460, 0.2719
0.4230, -0.7515, -0.9602, 0.4135
0.0777, 0.1056, 0.6841, 0.2428
-0.7517, -0.4416, 0.1715, 0.2525
-0.9627, 0.6013, -0.5341, 0.3701
0.6142, -0.2243, 0.7271, 0.2510
-0.7271, -0.8802, -0.7573, 0.3173
-0.9109, -0.7850, -0.5486, 0.2943
-0.9749, -0.8561, 0.9346, 0.2487
0.1362, -0.5934, -0.4953, 0.3394
0.1627, 0.9400, 0.6937, 0.3020
-0.5203, -0.0125, 0.2399, 0.2501
-0.9628, -0.8600, -0.0273, 0.2715
0.2127, 0.1377, -0.3653, 0.3758
-0.2397, 0.1019, 0.4907, 0.2496

The closed form solution for the values of the coefficients is coeffs = inv(Xt * X) * Xt * Y. The X is the “design matrix” which is a matrix of x values augmented by a first column of all 1s that act as dummy inputs for the b0 constant term. The Xt is the matrix transpose. The * operation is matrix multiplication. The Y is the actual y values stored in a matrix.

Brief derivation of the closed form solution of the coefficients. Here DX is the design matrix.


1. y = DX * w  | DX is not square, no inverse.

2. DXt * y = (DXt * DX) * w  | pre-mult DXt.

3. inv(DXt * DX) * DXt * y = inv(DXt * DX) * (DXt * DX) * w

4. inv(DXt * DX) * DXt * y = w  | because inv(A) * A = I.

Note that if inv(DXt * DX) doesn’t exist, which is possible for several reasons, this version of closed form solution fails. A solution to this problem is to add a small value, called alpha or noise, to the diagonal of the DXt * DX matrix. This is called linear ridge regression.

For the demo data, the resulting coefficients are b0 = 0.3162, b1 = 0.0400, b2 = 0.0246, b3 = -0.1017.

Therefore, the predicted y value for the first x value of (-0.17, 0.44, -1.00) is yPred = 0.3162 + (0.0400 * -0.17) + (0.0246 * 0.44 + (-0.1017 * -1.00) = 0.42.

Conceptually, linear ridge regression is very easy, but the implementation details are moderately tricky, mostly because matrix inversion is one of the most difficult algorithms to implement in all of numerical programming.

Anyway, looking at linear (ridge) regression from scratch using C# was a good way to start my morning.

Linear regression is one of the most primitive forms of machine learning because it assumes that data fit a geometric straight line (or hyperplane). Here are three early arcade games that are based on linear geometry.

Left: Tempest (1981) by Atari. You are the yellow object on the edge of the cube. Things fly out of the center towards you and you must shoot them and avoid them.

Center: Asteroids (1979) by Atari. You operate a small spaceship. You shoot asteroids to break them down into smaller pieces. You must avoid being shot by the flying saucer or hit by asteroid fragments. I spent far too many 25-cent quarters on this game when I was a college student at UC Irvine, living on Balboa Island with roommates Ed Koolish, Rex Burrows, and Randy Mullins. There was an arcade just a short ferry ride across the bay on the Newport peninsula.

Right: Qix (1981) by Taito. You must draw squares to carve out territory. You must avoid the undulating qix thing and fuses that run along the borders.

Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols.

using System;
namespace LinearRidgeRegressExplore
{
  internal class LinearRidgeRegressExploreProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nLinear ridge regression explore ");

      string fn = "..\\..\\..\\Data\\synthetic_train.txt";
      Console.WriteLine("\nLoading data into design matrix ");
      double[][] rawX = 
        Utils.MatLoad(fn, new int[] { 0, 1, 2 }, ',', "#");
      double[][] X = Utils.MatMakeDesign(rawX);

      double[][] Y = Utils.MatLoad(fn,
        new int[] { 3 }, ',', "#");
      Console.WriteLine("Done ");

      // coeffs = inv(Xt * X) * Xt * Y 
      Console.WriteLine("\nComputing coefficients:" +
        " inv(Xt * X) * Xt * Y ");
      double[][] a = Utils.MatTranspose(X);   // Xt
      double[][] b = Utils.MatProduct(a, X);  // Xt * X

      // condition b before inv() = ridge regression
      for (int i = 0; i "lt" b.Length; ++i)
        b[i][i] += 0.0;

      double[][] c = Utils.MatInverse(b);    
      double[][] d = Utils.MatProduct(c, a);  
      double[][] e = Utils.MatProduct(d, Y);
      // inv(Xt * X)
      // inv(Xt * X) * Xt
      // inv(Xt * X) * Xt * Y

      double[] coeffs = Utils.MatToVec(e);

      Console.Write("Done. Coefficients: ");
      Utils.VecShow(coeffs, 4, 8, true);

      double[] y;
      Console.Write("\n   X                ");
      Console.WriteLine("          Actual y  " +
        "Predicted y");
      for (int i = 0; i "lt" 10; ++i)
      {
        double[] x = rawX[i];
        y = Y[i];
        double yPred = Predict(x, coeffs);
        Utils.VecShow(x, 2, 8, false);
        Console.Write(y[0].ToString("F2").PadLeft(10));
        Console.WriteLine(yPred.ToString("F2").PadLeft(10));
      }
      Console.WriteLine(". . .");

      y = Utils.MatToVec(Y);
      double acc = Accuracy(rawX, y, coeffs, 0.10);
      Console.WriteLine("\nAccuracy (0.10) = " +
        acc.ToString("F4"));

      Console.WriteLine("\nEnd demo ");
      Console.ReadLine();
    } // Main

    static double Predict(double[] x, double[] coeffs)
    {
      // constant at coeffs[0]
      double sum = 0.0;
      int n = x.Length;  // number predictors
      for (int i = 0; i "lt" n; ++i)
        sum += x[i] * coeffs[i + 1];
      sum += coeffs[0];
      return sum;
    }

    static double Accuracy(double[][] X, double[] y,
      double[] coeffs, double pctClose)
    {
      int numCorrect = 0; int numWrong = 0;
      int n = X.Length;
      for (int i = 0; i "lt" n; ++i)
      {
        double[] x = X[i];
        double yActual = y[i];
        double yPred = Predict(x, coeffs);
        if (Math.Abs(yActual - yPred) "lt" 
          Math.Abs(pctClose * yActual))
          ++numCorrect;
        else
          ++numWrong;
      }
      return (numCorrect * 1.0) / n;
    }

  } // Program

  public class Utils
  {
    public static double[][] VecToMat(double[] vec,
      int nRows, int nCols)
    {
      double[][] result = MatCreate(nRows, nCols);
      int k = 0;
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          result[i][j] = vec[k++];
      return result;
    }

    public static double[][] MatCreate(int rows,
      int cols)
    {
      double[][] result = new double[rows][];
      for (int i = 0; i "lt" rows; ++i)
        result[i] = new double[cols];
      return result;
    }

    public static double[] MatToVec(double[][] m)
    {
      int rows = m.Length;
      int cols = m[0].Length;
      double[] result = new double[rows * cols];
      int k = 0;
      for (int i = 0; i "lt" rows; ++i)
        for (int j = 0; j "lt" cols; ++j)
        {
          result[k++] = m[i][j];
        }
      return result;
    }

    public static double[][] MatProduct(double[][] matA,
      double[][] matB)
    {
      int aRows = matA.Length;
      int aCols = matA[0].Length;
      int bRows = matB.Length;
      int bCols = matB[0].Length;
      if (aCols != bRows)
        throw new Exception("Non-conformable matrices");

      double[][] result = MatCreate(aRows, bCols);

      for (int i = 0; i "lt" aRows; ++i) // each row of A
        for (int j = 0; j "lt" bCols; ++j) // each col of B
          for (int k = 0; k "lt" aCols; ++k)
            result[i][j] += matA[i][k] * matB[k][j];

      return result;
    }

    
    public static double[][] MatTranspose(double[][] m)
    {
      int nr = m.Length;
      int nc = m[0].Length;
      double[][] result = MatCreate(nc, nr);  // note
      for (int i = 0; i "lt" nr; ++i)
        for (int j = 0; j "lt" nc; ++j)
          result[j][i] = m[i][j];
      return result;
    }

    // -------

    public static double[][] MatInverse(double[][] m)
    {
      // assumes determinant is not 0
      // that is, the matrix does have an inverse
      int n = m.Length;
      double[][] result = MatCreate(n, n); // make a copy
      for (int i = 0; i "lt" n; ++i)
        for (int j = 0; j "lt" n; ++j)
          result[i][j] = m[i][j];

      double[][] lum; // combined lower & upper
      int[] perm;  // out parameter
      MatDecompose(m, out lum, out perm);  // ignore return

      double[] b = new double[n];
      for (int i = 0; i "lt" n; ++i)
      {
        for (int j = 0; j "lt" n; ++j)
          if (i == perm[j])
            b[j] = 1.0;
          else
            b[j] = 0.0;

        double[] x = Reduce(lum, b); // 
        for (int j = 0; j "lt" n; ++j)
          result[j][i] = x[j];
      }
      return result;
    }

    private static int MatDecompose(double[][] m,
      out double[][] lum, out int[] perm)
    {
      // Crout's LU decomposition for matrix determinant
      // and inverse.
      // stores combined lower & upper in lum[][]
      // stores row permuations into perm[]
      // returns +1 or -1 according to even or odd number
      // of row permutations.
      // lower gets dummy 1.0s on diagonal (0.0s above)
      // upper gets lum values on diagonal (0.0s below)

      // even (+1) or odd (-1) row permutatuions
      int toggle = +1;
      int n = m.Length;

      // make a copy of m[][] into result lu[][]
      lum = MatCreate(n, n);
      for (int i = 0; i "lt" n; ++i)
        for (int j = 0; j "lt" n; ++j)
          lum[i][j] = m[i][j];

      // make perm[]
      perm = new int[n];
      for (int i = 0; i "lt" n; ++i)
        perm[i] = i;

      for (int j = 0; j "lt" n - 1; ++j) // note n-1 
      {
        double max = Math.Abs(lum[j][j]);
        int piv = j;

        for (int i = j + 1; i "lt" n; ++i) // pivot index
        {
          double xij = Math.Abs(lum[i][j]);
          if (xij "gt" max)
          {
            max = xij;
            piv = i;
          }
        } // i

        if (piv != j)
        {
          double[] tmp = lum[piv]; // swap rows j, piv
          lum[piv] = lum[j];
          lum[j] = tmp;

          int t = perm[piv]; // swap perm elements
          perm[piv] = perm[j];
          perm[j] = t;

          toggle = -toggle;
        }

        double xjj = lum[j][j];
        if (xjj != 0.0)
        {
          for (int i = j + 1; i "lt" n; ++i)
          {
            double xij = lum[i][j] / xjj;
            lum[i][j] = xij;
            for (int k = j + 1; k "lt" n; ++k)
              lum[i][k] -= xij * lum[j][k];
          }
        }

      } // j

      return toggle;  // for determinant
    } // MatDecompose

    private static double[] Reduce(double[][] luMatrix,
      double[] b) // helper
    {
      int n = luMatrix.Length;
      double[] x = new double[n];
      //b.CopyTo(x, 0);
      for (int i = 0; i "lt" n; ++i)
        x[i] = b[i];

      for (int i = 1; i "lt" n; ++i)
      {
        double sum = x[i];
        for (int j = 0; j "lt" i; ++j)
          sum -= luMatrix[i][j] * x[j];
        x[i] = sum;
      }

      x[n - 1] /= luMatrix[n - 1][n - 1];
      for (int i = n - 2; i "gte" 0; --i)
      {
        double sum = x[i];
        for (int j = i + 1; j "lt" n; ++j)
          sum -= luMatrix[i][j] * x[j];
        x[i] = sum / luMatrix[i][i];
      }

      return x;
    } // Reduce

    // -------

    private static int NumNonCommentLines(string fn,
      string comment)
    {
      int ct = 0;
      string line = "";
      FileStream ifs = new FileStream(fn,
        FileMode.Open);
      StreamReader sr = new StreamReader(ifs);
      while ((line = sr.ReadLine()) != null)
        if (line.StartsWith(comment) == false)
          ++ct;
      sr.Close(); ifs.Close();
      return ct;
    }

    public static double[][] MatLoad(string fn,
      int[] usecols, char sep, string comment)
    {
      // count number of non-comment lines
      int nRows = NumNonCommentLines(fn, comment);
      int nCols = usecols.Length;
      double[][] result = MatCreate(nRows, nCols);
      string line = "";
      string[] tokens = null;
      FileStream ifs = new FileStream(fn, FileMode.Open);
      StreamReader sr = new StreamReader(ifs);

      int i = 0;
      while ((line = sr.ReadLine()) != null)
      {
        if (line.StartsWith(comment) == true)
          continue;
        tokens = line.Split(sep);
        for (int j = 0; j "lt" nCols; ++j)
        {
          int k = usecols[j];  // into tokens
          result[i][j] = double.Parse(tokens[k]);
        }
        ++i;
      }
      sr.Close(); ifs.Close();
      return result;
    }

    public static double[][] MatMakeDesign(double[][] m)
    {
      // add a leading column of 1s = a design matrix
      int nRows = m.Length; int nCols = m[0].Length;
      double[][] result = MatCreate(nRows, nCols + 1);
      for (int i = 0; i "lt" nRows; ++i)
        result[i][0] = 1.0;

      for (int i = 0; i "lt" nRows; ++i)
      {
        for (int j = 0; j "lt" nCols; ++j)
        {
          result[i][j + 1] = m[i][j];
        }
      }
      return result;
    }

    public static void MatShow(double[][] m,
      int dec, int wid)
    {
      for (int i = 0; i "lt" m.Length; ++i)
      {
        for (int j = 0; j "lt" m[0].Length; ++j)
        {
          double v = m[i][j];
          if (Math.Abs(v) "lt" 1.0e-8) v = 0.0; // hack
          Console.Write(v.ToString("F" +
            dec).PadLeft(wid));
        }
        Console.WriteLine("");
      }
    }

    public static void VecShow(double[] vec,
      int dec, int wid, bool newLine)
    {
      for (int i = 0; i "lt" vec.Length; ++i)
      {
        double x = vec[i];
        if (Math.Abs(x) "lt" 1.0e-8) x = 0.0;  // hack
        Console.Write(x.ToString("F" +
          dec).PadLeft(wid));
      }
      if (newLine == true)
        Console.WriteLine("");
    }

  } // class Utils

} // ns