Gaussian Process Regression from Scratch as Simple as Possible Using Python

Gaussian process regression (GPR) is an interesting technique to predict a single numeric value. Just for hoots, I decided to implement a from-scratch version of the simplest possible example GPR system. My system works but it requires two hyperparameters and the internal kernel function (squared exponential function, aka radial basis function) is hard-coded.

I set up a simple demo problem with five training items like so:

  X = [[1,3], [2,3], [3,2], [4,2], [5,1]]]
  y =  [2.0,   4.0,   1.0,   5.0,   3.0]

The interface of my demo looks like:

  # create model
  gpr_model = GPR(theta=1.0, len_scale=1.0, noise=0)

  # train model
  gpr_model.learn(train_X, train_y)  # copies data by ref

  # use model
  print("Predicting for X = (6.0, 3.0) ")
  X = np.array([[6.0, 3.0]], dtype=np.float64)
  y, std = gpr_model.predict(X)
  print("\nPredicted y = %0.4f with std = %0.4f " % \
    (y, std))

Notice that GPR spits out predicted value(s) and also standard deviation(s) so you get a measure of uncertainty in the prediction.

In a non-demo GPR system, the theta and len_scale values, which are used by the internal kernel function, could be programmatically optimized. However, there is some research that suggests that programmatically optimizing theta and len_scale isn’t always a good idea because it can lead to a massively overfitted model. I’ve seen this effect myself. Briefly, if you have no test data and a small number of training data items, it’s usually best to manually tune the hyperparameters. This is often the case because GPR was designed for very small datasets.

The noise parameter is optional. It actually serves two purposes. First, it discourages the matrix inversion operations from blowing up. This is sometimes called conditioning the matrix. Second, noise adds a form of regularization that discourages model overfitting.

GPR has the reputation of being extremely complicated — and it is. But the demo code is surprisingly short if you can track down the key math equations. My demo was the result of aggregating about two dozen resources I found on the Internet.

The Gaussian process regression prediction equations for the means and variances. The various K symbols are covariances matrices that are made using a kernel function. The three K matrices are made from 1.) Ky: the training data against itself, 2.) K**: the data-to-predict against itself, and 3.) K*: the training data against the data-to-predict.

Extending the demo involves programmatically optimizing the theta and len_scale hyperparameters. This can be done in several ways. For example, the scikit library uses L-BFGS to minimize the log-likelihood of the training data. This is a lot of code if implemented from scratch — I estimate about four times as much code as the basic demo. Furthermore, I’m moderately skeptical that this idea is a good one. Another approach would be to set up arrays of possible values for theta and len_scale, then do a grid search to find the combination that gives the smallest root mean squared error. I will put this idea on my personal TODO list.

Very interesting stuff.

Shadow photography is an art form where shadows are projected onto the target subject. In my opinion, the key to a successful result is fine-tuning the balance of the amount of shadows projected and the geometry of the shadows. Here are three nice examples by Solve Sundsbo.

Demo code:

# synthetic_gpr_io.py
# Gaussian process regression with hard-coded kernel function
#  and hard-coded kernel parameters

import numpy as np

# -----------------------------------------------------------

class GPR:
  def __init__(self, theta, len_scale, noise):
    self.theta = theta  # used by kernel function
    self.len_scale = len_scale  # kernel function
    self.noise = noise   # avoids inverse() blow-up
    self.train_X = None  # supplied by learn()
    self.train_y = None
    self.inv_covar_mat = None  # needed in predict()

  def kernel_func(self, x1, x2):
    # squared exponential aka RBF
    dist_sq = np.linalg.norm(x1 - x2)**2
    term = -1 / (2 * self.len_scale**2)
    return self.theta * np.exp(dist_sq * term)

  def compute_covar_mat(self, x1, x2):
    n1 = len(x1); n2 = len(x2)
    result = np.zeros((n2,n1))  # very tricky indexing
    for j in range(n2):  # duplicated effort but simple
      for i in range(n1):
        result[j][i] = self.kernel_func(x2[j], x1[i])
    return result

  # the train data should be stored because 
  # it's used to make predictions
  def learn(self, train_X, train_y):
    self.train_X = train_X
    self.train_y = train_y
    covar_mat = self.compute_covar_mat(train_X, train_X)
    self.inv_covar_mat = np.linalg.inv(covar_mat + \
      self.noise * np.identity(len(train_X)))

  def predict(self, X):
    K1 = self.compute_covar_mat(self.train_X, X)
    tmp_K = self.compute_covar_mat(X, X)
    K2 = tmp_K - np.matmul(K1, \
      np.matmul(self.inv_covar_mat, K1.T)) 

    means = np.dot(K1, np.dot(self.train_y, \
      self.inv_covar_mat)).flatten()
    stds = np.sqrt(np.diag(K2))
    return means, stds

# -----------------------------------------------------------

def main():
  # 0. prepare
  print("\nBegin scratch Gaussian process regression IO ")
  np.random.seed(1)  # not used
  np.set_printoptions(precision=8, suppress=True)

  # 1. load data
  print("\nLoading train data ")
  train_X = np.array([[1,3], [2,3], [3,2], [4,2], [5,1]],
    dtype=np.float64)
  train_y = np.array([2.0, 4.0, 1.0, 5.0, 3.0], dtype=np.float64)
  print("train_X = "); print(train_X)
  print("train_y = "); print(train_y)

  # 2. create and train model
  print("\nCreating and training model ")
  gpr_model = GPR(theta=1.0, len_scale=1.0, noise=0)
  gpr_model.learn(train_X, train_y)  # copies data by ref

  # 3. predict the training data
  print("\nPredicting the training data ")
  means, stds = gpr_model.predict(train_X)
  print("\nPredicted y values: ")
  print(means)
  print("\nPredicted standard deviations: ")
  print(stds)

  # 4. predict new data
  print("\nPredciting for X = (6.0, 3.0) ")
  X = np.array([[6.0, 3.0]], dtype=np.float64)
  y, std = gpr_model.predict(X)
  print("\nPredicted y = %0.4f with std = %0.4f " % \
    (y, std))

  print("\nEnd GPR scratch IO demo ")

if __name__ == "__main__":
  main()