Testing From-Scratch Pseudo-Inverse via SVD-Jacobi versus via QR-Householder Using Python

Computing the pseudo-inverse of a matrix is difficult. The pseudo-inverse is used in several machine learning algorithms, such as closed form training for linear models.

There are several types of pseudo-inverse. The most common is Moore-Penrose. There are several algorithms for Moore-Penrose. Two common techniques are via SVD and via QR. There are several algorithms for SVD (Jacobi, Golub-Reinsch, others), and there are several algorithms for QR (Householder, Gram-Schmidt, others). In short, there are dozens of algorithms and variations that can be used to compute a pseudo-inverse.

I implemented pseudo-inverse from scratch with Python using SVD-Jacobi and QR-Householder. Let me note that both algorithms are extremely complex, especially SVD-Jacobi. I ran some tests with the two from-scratch implementations, and compared them with the built-in numpy.linalg.pinv() function. The results:

Begin pseudo-inverse SVD vs. QR experiment

Start testing my_pinv_svd()
trial 40 FAIL (93, 88)
trial 84 FAIL (98, 94)
trial 104 FAIL (94, 82)
trial 145 FAIL (91, 90)
trial 154 FAIL (92, 92)
trial 178 FAIL (99, 93)
trial 180 FAIL (96, 87)
trial 185 FAIL (99, 96)
trial 191 FAIL (86, 85)
Num pass = 191
Num fail = 9

Start testing my_pinv_qr()
Num pass = 200
Num fail = 0

Start testing np.linalg.pinv()
Num pass = 200
Num fail = 0

End experiment

C:\Data\Junk\CSharp\PseudoInvSVDvsPseudoInvQR>

For each technique, I ran 200 tests. Each test was a matrix with a random size between 2×2 and 100×100, and with random values between -10.0 and +10.0. The from-scratch QR-Householder and built-in linalg.pinv() versions passed all 200 tests, but the SVD-Jacobi version failed for 9 large (about 90×90) matrices.

Bottom line: Use the built-in linalg.pinv() function if possible. If you must implement from scratch, the QR-Householder algorithm is preferable to the SVD-Jacobi algorithm. If you must implement from scratch using SVD-Jacobi, apply it only to small matrices (say 50×50 or smaller).

I’m not sure if I gained a love of matrix algebra from my love of chess, or vice versa. One of the ways I gauge AI image generation progress is by asking for images of people playing chess. The most recent images are much better than they were a few months ago, but AI still has a way to go.

Left: This was an interesting result. The chess board only has seven ranks but otherwise the set is pretty good. I’m puzzled by the meaning of the chess player’s welcoming posture.

Center: Not very good. All the pieces are black — no white pieces, and the pieces don’t make sense — two black queens, etc.) And I don’t like the fuzziness of the background.

Right: Apparently some sort of fusion of 1980s hairstyles, handcuff chess, and 1980s neon color schemes.

Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols. (My blog editor chokes on symbols).

# pinv_svd_vs_pinv_qr.py

import numpy as np

np.set_printoptions(precision=6, suppress=True, floatmode='fixed')

rnd = np.random.RandomState(1)

# -----------------------------------------------------------

def my_qr(A, mode):
  # mode = 'complete' or 'reduced'
  # this version defined only for m gte n
  m, n = A.shape
  k = np.minimum(m,n)
  R = A.copy()
  Q = np.eye(m)

  if m == n: end = n-1
  else: end = n
  for i in range(end):
    H = np.eye(m)

    # Householder routine follows
    a = R[i:, i]
    norm_a = np.copysign(np.linalg.norm(a), a[0])
    v = a / (a[0] + norm_a)
    v[0] = 1.0
    h = np.eye(len(a))
    V = np.matmul(v[:, None], v[None, :])
    h -= (2.0 / np.dot(v, v)) * V

    H[i:, i:] = h  # copy h into lower right
    Q = np.matmul(Q, H)
    R = np.matmul(H, R)

  # 'reduced' : returns Q, R with dims (m, k), (k, n)
  if mode == 'complete':
    return Q, R
  elif mode == 'reduced':
    return Q[0:m,0:k], R[0:k,0:n]

  return Q, R

# -----------------------------------------------------------

def mat_inv_upper_tri(U):
  n = len(U)
  result = np.eye(n)
  for k in range(n):
    for j in range(n):
      for i in range(k):
        result[j][k] -= result[j][i] * U[i][k]
      result[j][k] /= U[k][k]
  return result

# -----------------------------------------------------------

def my_pinv_qr(A):
  # A = Q*R, pinv(A) = inv(R) * trans(Q)
  Q, R = my_qr(A, 'reduced') 
  result = np.matmul(mat_inv_upper_tri(R), Q.T)
  return result

# -----------------------------------------------------------
# -----------------------------------------------------------

def my_svd(M):
  # Jacobi algorithm
  DBL_EPSILON = 1.0e-15  # approximately
  A = np.copy(M)  # working copy U
  m = len(A)
  n = len(A[0])

  Q = np.eye(n)  # working copy V
  t = np.zeros(n)  # working copy s

  # init counters
  count = 1
  sweep = 0
  sweep_max = max(5 * n, 12)  # heuristic

  tolerance = 10 * m * DBL_EPSILON  # heuristic
  # store the column error estimates in t
  for j in range(n):
    cj = A[:, j]  # get col j
    sj = np.linalg.norm(cj)
    t[j] = DBL_EPSILON * sj

  # orthogonalize A by plane rotations
  while (count "gt" 0 and sweep "lte" sweep_max):
    # initialize rotation counter
    count = n * (n - 1) / 2;
    for j in range(n-1):
      for k in range(j+1, n):
        cj = A[:, j]
        ck = A[:, k]
        p = 2 * np.dot(cj, ck)
        a = np.linalg.norm(cj)
        b = np.linalg.norm(ck)

        # test for columns j,k orthogonal,
        # or dominant errors 
        abserr_a = t[j]
        abserr_b = t[k]

        q = (a * a) - (b * b)
        v = np.sqrt(p**2 + q**2)  # hypot()
 
        sorted = (a "gte" b)
        orthog = (abs(p) "lte" tolerance * (a*b))
        noisya = (a "lt" abserr_a)
        noisyb = (b "lt" abserr_b)

        if sorted and (orthog or \
          noisya or noisyb):
          count -= 1
          continue

        # calculate rotation angles
        if v == 0 or sorted == False:
          cosine = 0.0
          sine = 1.0
        else:
          cosine = np.sqrt((v + q) / (2.0 * v))
          sine = p / (2.0 * v * cosine)

        # apply rotation to A (U)
        for i in range(m):
          Aik = A[i][k]
          Aij = A[i][j]
          A[i][j] = Aij * cosine + Aik * sine
          A[i][k] = -Aij * sine + Aik * cosine

        # update singular values
        t[j] = abs(cosine) * abserr_a + \
          abs(sine) * abserr_b
        t[k] = abs(sine) * abserr_a + \
          abs(cosine) * abserr_b

        # apply rotation to Q (V)
        for i in range(n):
          Qij = Q[i][j]
          Qik = Q[i][k]
          Q[i][j] = Qij * cosine + Qik * sine
          Q[i][k] = -Qij * sine + Qik * cosine

    sweep += 1
  # while

  # compute singular values
  prev_norm = -1.0
  for j in range(n):
    column = A[:, j]  # by ref
    norm = np.linalg.norm(column)
    # determine if singular value is zero
    if norm == 0.0 or prev_norm == 0.0 or \
      (j "gt" 0 and norm "lte" tolerance * prev_norm):
      t[j] = 0.0
      for i in range(len(column)):
        column[i] = 0.0  # updates A indirectly
      prev_norm = 0.0
    else:
      t[j] = norm
      for i in range(len(column)):
        column[i] = column[i] * (1.0 / norm)
      prev_norm = norm

  if count "gt" 0:
    print("Jacobi iterations no converge")

  U = A  # mxn
  s = t
  Vh = np.transpose(Q)

  if m "lt" n:
    U = U[:, 0:m]
    s = t[0:m]
    Vh = Vh[0:m, :]

  return U, s, Vh

# -----------------------------------------------------------

def my_pinv_svd(A):
  # U,s,Vh = svd(); pinv = V * Sinv * Uh
  U, s, Vh = my_svd(A)
  Sinv = np.diag(s)
  for i in range(len(s)):
    Sinv[i][i] = 1.0 / Sinv[i][i]
  V = Vh.T
  Uh = U.T
  return np.matmul(V, np.matmul(Sinv, Uh))

# -----------------------------------------------------------
# -----------------------------------------------------------

def make_matrix(max_rows, max_cols, min_x, max_x, rnd):
  nr = rnd.randint(2, max_rows)
  nc = rnd.randint(2, max_cols)
  if nr "lt" nc:
    (nr, nc) = (nc, nr)
  A = (max_x - min_x) * rnd.rand(nr,nc) + min_x
  return A

# -----------------------------------------------------------

print("\nBegin pseudo-inverse SVD vs. QR experiment ")

print("\nStart testing my_pinv_svd() ")
n_pass = 0; n_fail = 0
rnd.seed(0)
for i in range(200):
  A = make_matrix(100, 100, -10.0, 10.0, rnd)
  Apinv = my_pinv_svd(A)
  C = np.matmul(A, np.matmul(Apinv, A))
  if np.allclose(A, C) == True:
    n_pass += 1
  else:
    print("trial " + str(i), end="")
    print(" FAIL ", end="")
    print(A.shape)
    n_fail += 1
print("Num pass = " + str(n_pass))
print("Num fail = " + str(n_fail))

print("\nStart testing my_pinv_qr() ")
n_pass = 0; n_fail = 0
rnd.seed(0)
for i in range(200):
  A = make_matrix(100, 100, -10.0, 10.0, rnd)
  Apinv = my_pinv_qr(A)
  C = np.matmul(A, np.matmul(Apinv, A))
  if np.allclose(A, C) == True:
    n_pass += 1
  else:
    print("trial " + str(i), end="")
    print(" FAIL ", end="")
    print(A.shape)
    n_fail += 1
print("Num pass = " + str(n_pass))
print("Num fail = " + str(n_fail))

print("\nStart testing np.linalg.pinv() ")
n_pass = 0; n_fail = 0
rnd.seed(0)
for i in range(200):
  A = make_matrix(100, 100, -10.0, 10.0, rnd)
  Apinv = np.linalg.pinv(A)
  C = np.matmul(A, np.matmul(Apinv, A))
  if np.allclose(A, C) == True:
    n_pass += 1
  else:
    print("trial " + str(i), end="")
    print(" FAIL ", end="")
    print(A.shape)
    n_fail += 1
print("Num pass = " + str(n_pass))
print("Num fail = " + str(n_fail))

print("\nEnd experiment ")