Matrix Left Pseudo-Inverse Via Normal Equations (Cholesky) – Refactor and Test Using C#

In machine learning, to train a linear model (linear regression, quadratic regression, etc.) you can use several techniques including stochastic gradient descent (SGD), L-BFGS optimization, and pseudo-inverse. There are two main categories of pseudo-inverse algorithms for machine learning scenarios where the training data has more rows than columns: relaxed Moore-Penrose pseudo-inverse, and left pseudo-inverse via normal equations.

For relaxed Moore-Penrose pseudo-inverse there are two main categories of techniques: using singular value decomposition (SVD) or QR decomposition. There are several SVD algorithms, including Jacobi, Householder+QR, Golub-Kahan-Reinsch, and Lanczos. There are several QR algorithms including Householder, modified Gram-Schmidt, and Givens.

For left pseudo-inverse via normal equations, the most common technique is Cholesky decomposition. There are several versions of Cholesky, but the most common is the Banachiewicz algorithm.

Each of these pseudo-inverse techniques has pros and cons related to numerical stability for large matrices, performance speed, and implementation complexity. For example, SVD Jacobi is the most stable (in theory) but the most complex. And QR modified Gram-Schmidt is relatively simple but less stable (in theory). And Cholesky is very stable but is vulnerable to poorly conditioned matrices.

I write code almost every day. Coding is a skill that must be practiced, and I enjoy coding. One morning before work, I figured I’d refactor my C# implementation of left pseudo-inverse via normal equations using Cholesky (Banachiewicz). So I did.

If you have a matrix A of training data (or a design matrix derived from the training data) with more rows than columns, the left pseudo-inverse can be computed as pinv = inv(At * A) * At, where At is the transpose of A, * is matrix multiplication, and inv() is any matrix inverse function. But At * A is square, symmetric, positive definite (assuming you can a small conditioning constant to the diagonal elements) and you can use a relatively simple matrix inverse — Cholesky decomposition.

This technique — left pseudo-inverse via normal equations using Cholesky — is dramatically simpler than any other form of pseudo-inverse for machine learning scenarios. But the technique can fail when the source matrix has many very small values or many large values. The problem is the At * A computation, which has lots of multiplication operations.

The key code in my demo is:

public class Cholesky
{
  public static double[][] MatPseudoInv(double[][] A)
  {
    // pinv = inv(At * A) * At
    double[][] At = MatTranspose(A);
    double[][] AtA = MatProduct(At, A);
    double[][] AtAinv = MatInvCholesky(AtA);
    double[][] result = MatProduct(AtAinv, At);
    return result;
  }

  private static double[][] MatInvCholesky(double[][] A)
  {
    // A must be square symmetric positive definite
    // A = L * Lt, inv(A) = inv(Lt) * inv(L)
    double[][] L = MatDecompCholesky(A); // lower tri
    double[][] Lt = MatTranspose(L); // upper tri
    double[][] invL = MatInvLowerTri(L);
    double[][] invLt = MatInvUpperTri(Lt);
    double[][] result = MatProduct(invLt, invL);
    return result;
  }

  private static double[][] MatDecompCholesky(double[][] A)
  {
    // lots of code
  }

  // lots of minor helpers like MatTranspose()

} // class Cholesky

I tested my refactored pseudo-inverse implementation by generating 10,000 random matrices, computing the pseudo-inverse, then verifying that A * Apinv * A = A to within 1.0e-8 in each cell. Each matrix has between 100 and 1,000 rows, between 2 and 20 columns, and each value is between -10.0 and +10.0. All 10,000 tests passed. This is by no means thorough testing, but it does indicate that the implementation is basically correct for non-pathological matrices.

Good fun.



I miss the days of newspapers.


Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operators (my blog editor chokes on symbols).

using System.IO;

namespace MatrixPseudoInverseCholesky
{
  internal class MatrixPseudoInverseCholeskyProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin left pseudo-inverse" +
        " via normal equations (Cholesky) ");

      int minRows = 100; int maxRows = 1000;
      int minCols = 2; int maxCols = 20;
      Random rnd = new Random(0);
      int nTrials = 10000;
      int nPass = 0; int nFail = 0;
      for (int i = 0; i "lt" nTrials; ++i)
      {
        Console.WriteLine("\n==========");
        Console.WriteLine("trial " + i);
        double[][] A =
          MakeRandomMatrix(minRows, maxRows,
          minCols, maxCols, rnd);
        double[][] Apinv = Cholesky.MatPseudoInv(A);

        // check (A * Apinv) * A = A
        double[][] AApinv = MatProduct(A, Apinv);
        double[][] C = MatProduct(AApinv, A);

        //Console.WriteLine("\nA = ");
        //MatShow(A, 4, 9);
        //Console.WriteLine("\nApinv = ");
        //MatShow(Apinv, 4, 9);
        //Console.WriteLine("\n(A * Apinv) * A = ");
        //MatShow(C, 4, 9);

        if (MatAreEqual(C, A, 1.0e-8) == true)
        {
          Console.WriteLine("pass");
          ++nPass;
        }
        else
        {
          Console.WriteLine("FAIL");
          Console.WriteLine("nRows = " + A.Length +
            " nCols = " + A[0].Length);
          Console.ReadLine();
          ++nFail;
        }
        Console.WriteLine("==========");
     } // nTrials

      Console.WriteLine("\nNumber pass = " + nPass);
      Console.WriteLine("Number fail = " + nFail);

      Console.WriteLine("\nEnd testing ");
      Console.ReadLine();
    } // Main

    // ------------------------------------------------------
    // helpers: MakeRandomMatrix(), MatProduct(), 
    // MatShow(), MatAreEqual()
    // ------------------------------------------------------

    public static double[][] MakeRandomMatrix(int minRows,
      int maxRows, int minCols, int maxCols, Random rnd)
    {
      int nRows = rnd.Next(minRows, maxRows);  // [2,maxN)
      int nCols = rnd.Next(minCols, maxCols);
      double[][] result = new double[nRows][];
      for (int i = 0; i "lt" nRows; ++i)
        result[i] = new double[nCols];

      double lo = -10.0; double hi = 10.0;
      for (int r = 0; r "lt" nRows; ++r)
        for (int c = 0; c "lt" nCols; ++c)
          result[r][c] = (hi - lo) * rnd.NextDouble() + lo;
      return result;
    }

    // ------------------------------------------------------

    public static double[][] MatProduct(double[][] A,
      double[][] B)
    {
      int aRows = A.Length; int aCols = A[0].Length;
      int bRows = B.Length; int bCols = B[0].Length;
      if (aCols != bRows)
        throw new Exception("Non-conformable matrices");

      double[][] result = new double[aRows][];
      for (int i = 0; i "lt" aRows; ++i)
        result[i] = new double[bCols];

      for (int i = 0; i "lt" aRows; ++i) // each row of A
        for (int j = 0; j "lt" bCols; ++j) // each col of B
          for (int k = 0; k "lt" aCols; ++k)
            result[i][j] += A[i][k] * B[k][j];

      return result;
    }

    // ------------------------------------------------------

    public static void MatShow(double[][] m, int dec, int wid)
    {
      int nRows = m.Length; int nCols = m[0].Length;
      double small = 1.0 / Math.Pow(10, dec);
      for (int i = 0; i "lt" nRows; ++i)
      {
        for (int j = 0; j "lt" nCols; ++j)
        {
          double v = m[i][j];
          if (Math.Abs(v) "lt" small) v = 0.0;
          Console.Write(v.ToString("F" + dec).
            PadLeft(wid));
        }
        Console.WriteLine("");
      }
    }

    // ------------------------------------------------------

    public static bool MatAreEqual(double[][] A, double[][] B,
      double epsilon)
    {
      int nRows = A.Length; int nCols = A[0].Length;
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          if (Math.Abs(A[i][j] - B[i][j]) "gt" epsilon)
            return false;
      return true;
    }

    // ------------------------------------------------------

  } // Program

  // ========================================================

  public class Cholesky
  {
    public static double[][] MatPseudoInv(double[][] A)
    {
      // pinv = inv(At * A) * At
      double[][] At = MatTranspose(A);
      double[][] AtA = MatProduct(At, A);
      double[][] AtAinv = MatInvCholesky(AtA);
      double[][] result = MatProduct(AtAinv, At);
      return result;
    }

    private static double[][] MatInvCholesky(double[][] A)
    {
      // A must be square symmetric positive definite
      // A = L * Lt, inv(A) = inv(Lt) * inv(L)
      double[][] L = MatDecompCholesky(A); // lower tri
      double[][] Lt = MatTranspose(L); // upper tri
      double[][] invL = MatInvLowerTri(L);
      double[][] invLt = MatInvUpperTri(Lt);
      double[][] result = MatProduct(invLt, invL);
      return result;
    }

    private static double[][] MatDecompCholesky(double[][] A)
    {
      // A must be square, symmetric, positive definite
      int m = A.Length; int n = A[0].Length;  // m == n
      double[][] L = new double[n][];
      for (int i = 0; i "lt" n; ++i)
        L[i] = new double[n];

      for (int i = 0; i "lt" n; ++i)
      {
        for (int j = 0; j "lte" i; ++j)
        {
          double sum = 0.0;
          for (int k = 0; k "lt" j; ++k)
            sum += L[i][k] * L[j][k];
          if (i == j)
          {
            double tmp = A[i][i] - sum;
            if (tmp "lt" 0.0)
              throw new
                Exception("decomp Cholesky fatal");
            L[i][j] = Math.Sqrt(tmp);
          }
          else
          {
            if (L[j][j] == 0.0)
              throw new
                Exception("decomp Cholesky fatal ");
            L[i][j] = (A[i][j] - sum) / L[j][j];
          }
        } // j
      } // i

      return L;
    }

    // ------------------------------------------------------

    private static double[][] MatInvLowerTri(double[][] A)
    {
      int n = A.Length;  // must be square matrix
      double[][] result = MatIdentity(n);

      for (int k = 0; k "lt" n; ++k)
      {
        for (int j = 0; j "lt" n; ++j)
        {
          for (int i = 0; i "lt" k; ++i)
          {
            result[k][j] -= result[i][j] * A[k][i];
          }
          result[k][j] /= (A[k][k] + 1.0e-8);
        }
      }
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatInvUpperTri(double[][] A)
    {
      int n = A.Length;  // must be square matrix
      double[][] result = MatIdentity(n);

      for (int k = 0; k "lt" n; ++k)
      {
        for (int j = 0; j "lt" n; ++j)
        {
          for (int i = 0; i "lt" k; ++i)
          {
            result[j][k] -= result[j][i] * A[i][k];
          }
          result[j][k] /= (A[k][k] + 1.0e-8); // avoid 0
        }
      }
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatMake(int nRows, int nCols)
    {
      double[][] result = new double[nRows][];
      for (int i = 0; i "lt" nRows; ++i)
        result[i] = new double[nCols];
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatIdentity(int n)
    {
      double[][] result = MatMake(n, n);
      for (int i = 0; i "lt" n; ++i)
        result[i][i] = 1.0;
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatTranspose(double[][] M)
    {
      int nRows = M.Length; int nCols = M[0].Length;
      double[][] result = MatMake(nCols, nRows);  // note
      for (int i = 0; i "lt" nRows; ++i)
        for (int j = 0; j "lt" nCols; ++j)
          result[j][i] = M[i][j];
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatProduct(double[][] A,
      double[][] B)
    {
      int aRows = A.Length; int aCols = A[0].Length;
      int bRows = B.Length; int bCols = B[0].Length;
      if (aCols != bRows)
        throw new Exception("Non-conformable matrices");

      double[][] result = MatMake(aRows, bCols);

      for (int i = 0; i "lt" aRows; ++i) // each row of A
        for (int j = 0; j "lt" bCols; ++j) // each col of B
          for (int k = 0; k "lt" aCols; ++k)
            result[i][j] += A[i][k] * B[k][j];

      return result;
    }

    // ------------------------------------------------------

  } // class Cholesky

  // ========================================================

} // ns
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply