Matrix Inverse Using Cayley-Hamilton from Scratch Python

Computing the inverse of a matrix is a fundamental task in machine learning algorithms. The Wikipedia article on matrix inverse lists 14 different algorithms, and each algorithm has multiple variations, and each variation has dozens of implementation alternatives. The fact that there are so many different techniques is a direct indication of how tricky it is to compute a matrix inverse.

One of the most interesting techniques to compute a matrix inverse is called Cayley-Hamilton. It is by far (in my opinion) the simplest matrix inverse algorithm, but it is only feasible for matrices with size up to about 200-by-200, and when the matrix values are roughly evenly distributed between positive and negative values. Put another way, matrix inverse using Cayley-Hamilton is really more of a curiosity than a practical technique.

One evening, I just couldn’t get to sleep, so figured I’d code Cayley-Hamilton matrix inverse using from-scratch Python and NumPy.

Suppose you have a 5-by-5 square matrix A that can be inverted:

  1.0   2.0   3.0   1.0   5.0
  0.0   5.0   4.0   1.0   4.0
  6.0   1.0   0.0   2.0   2.0
  1.0   4.0   5.0   3.0   2.0
  0.0   2.0   4.0   0.0   1.0

The first step is to compute the “coefficients of the characteristic polynomial of A”. There are several ways to do this, but for now, just assume you can get the coefficients.

For a 5-by-5 matrix there will be 6 coefficients stored in a vector c. The first cell of c, c[0], will always hold a 1.0 value. For the demo matrix A, it turns out that the coefficients are:

  1   -10   -21    37    373   -610
 [0]  [1]   [2]   [3]    [4]    [5]

The inverse of matrix A is:

Ainv = (A^4 - 10*A^3 - 21*A^2 + 37*A + 373*I) / 610

The A^4 means A * A * A * A. Cell coefficient c[0] is always 1 and can be ignored (or you can imagine it is associated with the A^4 term). Cell c[1] goes with A^3, cell c[2] goes with A^2, cell c[3] goes with A. The next to last cell, c[4] goes with I, the identity matrix. The last cell, c[5] is the negative of a final division scaling factor.

If you examine this example for a bit, you should see the pattern. Putting this together for a matrix A with size n-by-n gives a stunningly short function to compute an inverse.

This example points out the key weakness of Cayley-Hamilton. For a 100-by-100 matrix, you’d need to compute powers of A up to 99. This is time-consuming, and is subject to round off errors and arithmetic overflow and underflow.

The simplicity of Cayley-Hamilton matrix inverse depends on being able to compute the coefficients of the characteristic polynomial. This is another very deep math topic but fortunately one that can be implemented relatively easily. There are several algorithms to compute the coefficients but the easiest, and the one used by the demo program, is called the Faddeev-LeVerrier algorithm. Faddeev-LeVerrier has the same limitations as the Cayley-Hamilton algorithm (only practical for relatively small matrices with mixed positive and negative values).

The output of my demo program is:

Cayley-Hamilton matrix inverse scratch Python

Source matrix A:
[[1.0 2.0 3.0 1.0 5.0]
 [0.0 5.0 4.0 1.0 4.0]
 [6.0 1.0 0.0 2.0 2.0]
 [1.0 4.0 5.0 3.0 2.0]
 [0.0 2.0 4.0 0.0 1.0]]

Polynomial coefficients:
[   1.0  -10.0  -21.0   37.0  373.0 -610.0]

Inverse via my_inverse():
[[-0.0738  0.0000  0.1967 -0.1066  0.1885]
 [-0.2984  0.4000  0.0623 -0.0754 -0.0820]
 [ 0.0836 -0.2000 -0.0230  0.0541  0.3197]
 [ 0.1082 -0.2000 -0.0885  0.4230 -0.4098]
 [ 0.2623  0.0000 -0.0328 -0.0656 -0.1148]]

Inverse via np.linalg.inv():
[[-0.0738  0.0000  0.1967 -0.1066  0.1885]
 [-0.2984  0.4000  0.0623 -0.0754 -0.0820]
 [ 0.0836 -0.2000 -0.0230  0.0541  0.3197]
 [ 0.1082 -0.2000 -0.0885  0.4230 -0.4098]
 [ 0.2623 -0.0000 -0.0328 -0.0656 -0.1148]]

End demo

An interesting side effect of using Cayley-Hamilton to compute a matrix inverse is that when you compute the coefficients of the characteristic polynomial, the last coefficient gives you the matrix determinant. If n is even, the determinant is the last coefficient. If n is odd, the determinant is -1 times the last coefficient. So for the demo, n = 5 is odd, and so the determinant of A is -1 * -610 = 610.

A fun and interesting exploration.

The matrix design of a chess board completely fascinated me from the very first time I saw a chess set. I was a pretty good player — my Servite High School chess team won the Orange County California Chess Championship in my junior and senior years. Now I only play against the computer and I’m lucky to get a draw. Here are two interesting Art Deco style chess sets.

Demo program. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols (my blog editor often chokes on symbols).

# matrix_inverse_cayley_hamilton.py

import numpy as np

def my_poly(A):
  # coeffs of characteristic polynomial of A
  # A is a square matrix type np.float64
  # Faddeev-LeVerrier algorithm
  # could have used the built-in numpy.poly()
  n = len(A)
  c = np.zeros(n+1, dtype=np.float64)  # result
  c[0] = 1.0
  Ak = np.copy(A)
  for k in range(1, n+1):
    ck = -1 * Ak.trace() / k
    c[k] = ck
    for i in range(n):
      Ak[i][i] += ck
    Ak = np.matmul(A, Ak)
  # determinant is (-1)^n * coeff[n]
  # if n % 2 == 0: print("\ndet = %0.1f" % c[n])
  # else: print("\ndet = %0.1f" % (-1 * c[n]))
  
  return c

def my_inverse(A):
  # Cayley–Hamilton technique
  n = len(A)  # n gte 2
  coeffs = my_poly(A)  # len is n+1
  Afactor = np.copy(A)  # A, A*A, A*A*A, . . .
  result = coeffs[n-1] * np.eye(n)
  k = n-2
  while k "gte" 0:  # greater-than-or-equal
    result += coeffs[k] * Afactor
    Afactor = np.matmul(Afactor, A)
    k -= 1
  result /= -1 * coeffs[n]
  return result

print("\nCayley-Hamilton matrix inverse scratch Python ")
np.set_printoptions(precision=1, 
  suppress=True, floatmode='fixed')

A = np.array([
[1., 2., 3., 1., 5.],
[0., 5., 4., 1., 4.],
[6., 1., 0., 2., 2.],
[1., 4., 5., 3., 2.],
[0., 2., 4., 0., 1.]])

print("\nSource matrix A: ")
print(A)

print("\nPolynomial coefficients: ")
coeffs = my_poly(A)
print(coeffs)

Ainv = my_inverse(A)
print("\nInverse via my_inverse(): ")
np.set_printoptions(precision=4)
print(Ainv)

Ainv = np.linalg.inv(A)
print("\nInverse via np.linalg.inv(): ")
print(Ainv)

# Adet = np.linalg.det(A)
# print("\nDeterminant via np.linalg.det(): ")
# print("%0.1f " % Adet)

print("\nEnd demo ")