Matrix Pseudo-Inverse via Normal Equations (Left Pseudo-Inverse) Using C#

One Sunday morning, I figured I’d refactor one of my implementations of matrix pseudo-inverse via normal equations, sometimes called the left pseudo-inverse. The technique is not a general purpose pseudo-inverse such as the Moore-Penrose pseudo-inverse — it is used in machine learning scenarios with a matrix of training data (or a design matrix derived from the training data matrix) where the number of rows is greater than or equal to the number of columns. (There is a right pseudo-inverse for non-ML scenarios where the number of rows is less than the number of columns).

If you have a source matrix A with nRows gte nCols, the pseudo-inverse via normal equations is computed as inv(At * A) * At where At is the transpose of A, and inv() is any standard matrix inverse. But because At * A is square, symmetric, positive definite (is a small conditioning constant is added to the diagonal), it is possible to use the relatively simple inverse via Cholesky decomposition.

My main goal in the refactoring was to put the essential code into a container class called Cholesky to reduce the number of isolated methods.

My refactoring looks like:

public class Cholesky
{

  public static double[][] MatPseudoInv(double[][] A)
  {
    // inv(At * A) * A
    double[][] At = MatTranspose(A);
    double[][] AtA = MatProduct(At, A);
    for (int i = 0; i < AtA.Length; ++i)
      AtA[i][i] += 1.0e-8; /// condition before inv
    double[][] AtAinv = MatInvCholesky(AtA);
    double[][] pinv = MatProduct(AtAinv, At);
    return pinv;
  } 

  // MatInvCholesky() and many other helper functions
}

I dropped my refactored code into a test harness that I had. I generated 10,000 random matrices, where each matrix has between 100 and 1,000 rows, and between 2 and 20 columns, with each value in the matrix between -10.0 and +10.0. The MatPseudoInv() function pass xxx as expected.

The downside to pseudo-inverse via normal equations training is the (Xt * X) matrix multiplication operation. Suppose X is the training data design matrix, and it has 1000 rows and 10 columns. Then Xt * X is shape (10 x 1000) * (1000 x 10) which is only 10 x 10 -- small and simple to invert. But the number of multiplication and addition operations involved in computing Xt * X is approximately (10 * 10 * (2 * 1000)) = 200,000 which is a lot, and if there are very small values or very large values in X, then there could be arithmetic overflow or underflow.



I love stop motion animated movies. They're kind of like refactored reality. There are a surprising number of these movies.

Left: In "James and the Giant Peach" (1996), young orphan James runs away from his evil aunts Spiker and Sponge. He finds a way to grow a peach to giant size. He meets Mr. Grasshopper, Mr. Centipede, Mrs. Ladybug, Miss Spider, Mr. Earthworm, and Mrs. Glowworm. This is a wildly creative movie that has a happy ending. My personal grade = A-.

Right: In "Kubo and the Two Strings" (2016), young boy Kubo has a magic musical instrument sort of like a Japanese guitar. Kubo is menaced by his two aunts and has many adventures. This is another wildly creative movie that has a happy ending. My personal grade = solid A.

Both of these movies got great reviews from critics and audiences, but they were box office disappointments that lost money because not enough people were willing to take a chance on unusual stories with unusual style.


Demo code. Replace "lt" (less than), "gt", "lte", "gte" with Boolean operator symbols (my blog editor chokes on symbols).

  public class Cholesky
  {
    // container class for MatPseudoInv()

    public static double[][] MatPseudoInv(double[][] A)
    {
      // left pseudo-inverse via normal equations
      // nRows must be gte nCols
      // inv(At * A) * A
      double[][] At = MatTranspose(A);
      double[][] AtA = MatProduct(At, A);
      for (int i = 0; i "lt" AtA.Length; ++i)
        AtA[i][i] += 1.0e-8; /// condition before inv
      double[][] AtAinv = MatInvCholesky(AtA);
      double[][] pinv = MatProduct(AtAinv, At);
      return pinv;
    } // MatPseudoInv()

    // ------------------------------------------------------

    private static double[][] MatDecompCholesky(double[][] A)
    {
      // decompose A to L such that L * Lt = A
      // A must be square
      int m = A.Length; int n = A[0].Length;  // m == n
      double[][] L = MatMake(n, n);  // or m

      for (int i = 0; i "lt" n; ++i)
      {
        for (int j = 0; j "lte" i; ++j)
        {
          double sum = 0.0;
          for (int k = 0; k "lt" j; ++k)
            sum += L[i][k] * L[j][k];
          if (i == j)
          {
            double tmp = A[i][i] - sum;
            if (tmp "lt" 0.0)
              throw new
                Exception("decomp Cholesky fatal");
            L[i][j] = Math.Sqrt(tmp);
          }
          else
          {
            if (L[j][j] == 0.0)
              throw new
                Exception("decomp Cholesky fatal ");
            L[i][j] = (A[i][j] - sum) / L[j][j];
          }
        } // j
      } // i

      return L;
    } // MatDecompCholesky()

    private static double[][] MatInvCholesky(double[][] A)
    {
      int m = A.Length; int n = A[0].Length;  // m == n
      double[][] L = MatDecompCholesky(A);

      double[][] result = MatMake(n, n); // make Identity
      for (int i = 0; i "lt" n; ++i)
        result[i][i] = 1.0;

      for (int k = 0; k "lt" n; ++k)
      {
        for (int j = 0; j "lt" n; j++)
        {
          for (int i = 0; i "lt" k; i++)
          {
            result[k][j] -= result[i][j] * L[k][i];
          }
          result[k][j] /= L[k][k];
        }
      }

      for (int k = n - 1; k "gte" 0; --k)
      {
        for (int j = 0; j "lt" n; j++)
        {
          for (int i = k + 1; i "lt" n; i++)
          {
            result[k][j] -= result[i][j] * L[i][k];
          }
          result[k][j] /= L[k][k];
        }
      }
      return result;
    } // MatInvCholesky()

    private static double[][] MatMake(int nRows, int nCols)
    {
      double[][] result = new double[nRows][];
      for (int i = 0; i "lt" nRows; ++i)
        result[i] = new double[nCols];
      return result;
    }

    // ------------------------------------------------------

    private static double[][] MatTranspose(double[][] M)
    {
      int nr = M.Length; int nc = M[0].Length;
      double[][] result = new double[nc][]; // note
      for (int i = 0; i "lt" nc; ++i)
        result[i] = new double[nr];
      for (int i = 0; i "lt" nr; ++i)
        for (int j = 0; j "lt" nc; ++j)
          result[j][i] = M[i][j]; // note
      return result;
    }

    // ````````````````````````````````````````````````````

    private static double[][] MatProduct(double[][] A,
      double[][] B)
    {
      int aRows = A.Length; int aCols = A[0].Length;
      int bRows = B.Length; int bCols = B[0].Length;
      if (aCols != bRows)
        throw new Exception("Non-conformable matrices");

      double[][] result = new double[aRows][];
      for (int i = 0; i "lt" aRows; ++i)
        result[i] = new double[bCols];

      for (int i = 0; i "lt" aRows; ++i) // each row of A
        for (int j = 0; j "lt" bCols; ++j) // each col of B
          for (int k = 0; k "lt" aCols; ++k)
            result[i][j] += A[i][k] * B[k][j];
      return result;
    }

    // ------------------------------------------------------

  } // class Cholesky
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply