Showdown: Gaussian Process Regression vs Neural Network Regression

The goal of a regression problem is to predict predict a single numeric value. For example, you might want to predict the price of a house based on its area in square feet, number of bedrooms, tax rate, and so on.

Based on my experience, two of the most powerful machine learning regression techniques are Gaussian process regression (GPR) and neural network (NN) regression. I decided I’d do a few experiments. My observations: GPR is easier to tune and usually give decent results. NN is more difficult to tune but a well-tuned NN model often gives slightly better results than GPR.

These are generalizations and for any given problem scenario, one technique may be better than the other.

I generated synthetic data with 10 predictor variables (and one target value to predict). Each predictor value is between -1.0 and +1.0 to simulate divide-by-k normalization. Each target value is between 0.0 and 1.0. I made 1000 training items and 200 test items. I used a neural network to generate the synthetic data but none of the generator information was used for the NN prediction model, so the NN models had no advantage over the GPR models. The point is that there is an underlying latent model that is discoverable in theory by either technique.

Note: If there are n training items, then in theory GPR constructs and inverts an nxn kernel matrix. Therefore, GPR is best suited for relatively small datasets. My demo data of 1,000 training items was intended to stress GPR. In practice, GPR can use more efficient ways to invert the kernel matrix.

I used the scikit library GaussianProcessRegressor module for GPR like so:

krnl = ConstantKernel(1.0, (1e-3, 1e3)) + RBF(1.0, (1e-3, 1e3))

model = GaussianProcessRegressor(kernel=krnl, normalize_y=True,
  random_state=0, alpha=0.05)

model.fit(train_X, train_y)

Note: By default, scikit will try to optimize the kernel function(s) parameter(s). This sometimes fails. I probably should have instructed scikit to not try to optimize like so: krnl = ConstantKernel(1.0, constant_value_bounds=”fixed”) * RBF(1.0, length_scale_bounds=”fixed”).

I used the PyTorch library for the NN like so:

net = Net().to(device)  # 10-(20-20)-1

net.train()
train(net, train_ds, bs=20, lr=0.01, me=1000, le=200)

I evaluated the two prediction models using an accuracy function where a prediction that’s within 10% of the true target value is scored as correct.

The GPR model scored 43.7% accuracy on the training data (437 out of 1000 correct) and 32.0% accuracy on the test data (64 out of 200 correct).

The NN model scored 62.4% on the training data (624 out of 1000 correct) and 39.0% on the test data (78 out of 200 correct).

Different combinations of hyperparameters gave significantly different results. Sometimes GPR was better and sometimes NN was better. Therefore, I could draw no strong conclusions.

There are other factors to consider when comparing GPR and NNs. NNs can handle predictor data that has categorical values (such a a color variable with possible values “red”, “white”, or “blue”) but GPR can accept only strictly numeric data. (There are some resources that claim GPR works well with categorical data by using one-hot encoding, but there’s no solid research on this topic). GPR predictions produce a confidence interval (such as “predicted y = 0.55 with a standard deviation = 0.02”) but NNs just spit out a predicted value. GPR works decently with small sets of training data but NNs typically requires more data than GPR to get good results. Some Internet resources state that a GPR model is more interpretable than a NN model but I don’t buy that claim; I believe both GPR and NN models are essentially opaque and you need specialized analysis, such as sensitivity and robustness, to make sense of these models.

By the way, a third ML technique for regression — kernel ridge regression (KRR) — has an informal reputation of being less powerful than GPR and NN. But this is not correct. GPR and KRR are very closely related. The underlying mathematics of GPR and KRR are essentially the same, and for the same kernel function and alpha/noise parameter values, GPR and KRR give identical predictions. I’ll post on this topic at some point.

If machine learning models are thought of as detectives, then Gaussian process regression and neural networks can be considered as sort of collaborators (not competitors) to unearth the truth in data. Here are three of my favorite detective collaborations in movies. Left: Sherlock Holmes and Dr. John Watson (here actors Basil Rathbone and Nigel Bruce from the 1940s). Center: Hercule Poirot and Captain Arthur Hastings (here actors David Suchet and Hugh Fraser in the long-running British TV series). Right: Charlie Chan and Number One Son, Lee Chan (here actors Warner Oland and Keye Luke from the 1930s).

Demo code:

# synthetic_gpr_vs_nn.py
# Gaussian process regression vs. neural network

# Anaconda3-2022.10  Python 3.9.13
# scikit 1.0.2  Windows 10/11 

import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.gaussian_process.kernels import ConstantKernel

import torch as T
device = T.device('cpu')  # apply to Tensor or Module

# -----------------------------------------------------------

def accuracy_gpr(model, data_X, data_y, pct_close):
  # correct within pct of true income
  n_correct = 0; n_wrong = 0

  for i in range(len(data_X)):
    X = data_X[i].reshape(1, -1)  # one-item batch
    y = data_y[i]
    pred = model.predict(X)       # predicted income

    if np.abs(pred - y) "lt" np.abs(pct_close * y):
      n_correct += 1
    else:
      n_wrong += 1
  acc = (n_correct * 1.0) / (n_correct + n_wrong)
  return acc

# -----------------------------------------------------------

class SyntheticDataset(T.utils.data.Dataset):
  def __init__(self, src_file):

    tmp_x = np.loadtxt(src_file, usecols=(0,1,2,3,4,5,6,7,8,9),
      delimiter="\t", comments="#", dtype=np.float32)
    tmp_y = np.loadtxt(src_file, usecols=10, delimiter="\t",
      comments="#", dtype=np.float32)
    tmp_y = tmp_y.reshape(-1,1)  # 2D required

    self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device)
    self.y_data = T.tensor(tmp_y, dtype=T.float32).to(device)

  def __len__(self):
    return len(self.x_data)

  def __getitem__(self, idx):
    preds = self.x_data[idx]
    trgts = self.y_data[idx] 
    return (preds, trgts)  # as a tuple

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(10, 20)  # 10-(8-8)-1
    self.hid2 = T.nn.Linear(20, 20)
    self.oupt = T.nn.Linear(20, 1)

    T.nn.init.xavier_uniform_(self.hid1.weight)
    T.nn.init.zeros_(self.hid1.bias)
    T.nn.init.xavier_uniform_(self.hid2.weight)
    T.nn.init.zeros_(self.hid2.bias)
    T.nn.init.xavier_uniform_(self.oupt.weight)
    T.nn.init.zeros_(self.oupt.bias)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.tanh(self.hid2(z))
    z = self.oupt(z)  # regression: no activation
    return z

# -----------------------------------------------------------

def accuracy_nn(model, ds, pct_close):
  # all-at-once (quick)
  # assumes model.eval()
  X = ds.x_data  # all inputs
  Y = ds.y_data  # all targets
  n_items = len(X)
  with T.no_grad():
    pred = model(X)  # all predicted incomes
 
  n_correct = T.sum((T.abs(pred - Y) "lt" T.abs(pct_close * Y)))
  result = (n_correct.item() / n_items)  # scalar
  return result  

# -----------------------------------------------------------

def train(model, ds, bs, lr, me, le):
  # dataset, bat_size, lrn_rate, max_epochs, log interval
  train_ldr = T.utils.data.DataLoader(ds, batch_size=bs,
    shuffle=True)
  loss_func = T.nn.MSELoss()
  optimizer = T.optim.Adam(model.parameters(), lr=lr)

  for epoch in range(0, me):
    epoch_loss = 0.0  # for one full epoch
    for (b_idx, batch) in enumerate(train_ldr):
      X = batch[0]  # predictors
      y = batch[1]  # target income
      optimizer.zero_grad()
      oupt = model(X)
      loss_val = loss_func(oupt, y)  # a tensor
      epoch_loss += loss_val.item()  # accumulate
      loss_val.backward()  # compute gradients
      optimizer.step()     # update weights

    if epoch % le == 0:
      print("epoch = %4d  |  loss = %0.4f"  % \
        (epoch, epoch_loss)) 

# -----------------------------------------------------------

def main():
  # 0. prepare
  print("Begin scikit Gaussian process regression ")
  print("Predict synthetic data ")
  np.random.seed(1)
  np.set_printoptions(precision=4, suppress=True)

# -----------------------------------------------------------

  # 1. load data
  print("\nLoading train and test data for GPR ")
  train_file = ".\\Data\\synthetic_train.txt"
  train_X = np.loadtxt(train_file, delimiter="\t", 
    usecols=(0,1,2,3,4,5,6,7,8,9),
    comments="#", dtype=np.float32)
  train_y = np.loadtxt(train_file, delimiter="\t", 
    usecols=10, comments="#", dtype=np.float32) 

  test_file = ".\\Data\\synthetic_test.txt"
  test_X = np.loadtxt(test_file, delimiter="\t",
    usecols=(0,1,2,3,4,5,6,7,8,9),
    comments"#", dtype=np.float32)
  test_y = np.loadtxt(test_file, delimiter="\t",
    usecols=10, comments="#", dtype=np.float32) 
  print("Done ")

  print("\nData: ")
  print(train_X[0:2][:])
  print(". . .")
  print("\nActual targets: ")
  print(train_y[0:2])
  print(". . . ")

# -----------------------------------------------------------

  # 2. create and train GPR model
  print("\nCreating GPR model with Constant + RBF kernel ")
  krnl = ConstantKernel(1.0, (1e-3, 1e3)) + RBF(1.0, (1e-3, 1e3))

  model = GaussianProcessRegressor(kernel=krnl, normalize_y=True,
    random_state=0, alpha=0.05)

  print("\nTraining model ")
  model.fit(train_X, train_y)
  print("Done ")

  # 3. compute model accuracy
  print("\nComputing accuracy (within 0.10 of true price) ")
  acc_train = accuracy_gpr(model, train_X, train_y, 0.10)
  print("\nAccuracy on train data = %0.4f " % acc_train)
  acc_test = accuracy_gpr(model, test_X, test_y, 0.10)
  print("Accuracy on test data = %0.4f " % acc_test)

  print("\nEnd GPR prediction ")

# ===========================================================

  # 4. create and train NN model
  print("\nBegin PyTorch NN regression ")
  T.manual_seed(0)
  np.random.seed(0)

  # 5. create Dataset objects for NN
  print("\nCreating synthetic data Datasets ")
  train_ds = SyntheticDataset(train_file)  # 1000 rows
  test_ds = SyntheticDataset(test_file)  # 200 rows

  # 6. create network
  print("\nCreating 10-(8-8)-1 neural network ")
  net = Net().to(device)

  # 7. train model
  print("\nbat_size = 20 ")
  print("loss = MSELoss() ")
  print("optimizer = Adam ")
  print("lrn_rate = 0.01 ")

  print("\nStarting training")
  net.train()
  train(net, train_ds, bs=20, lr=0.01, me=1000, le=200)
  print("Done ")

# -----------------------------------------------------------

  # 8. evaluate NN model accuracy
  print("\nComputing model accuracy (within 0.10 of true) ")
  net.eval()
  acc_train = accuracy_nn(net, train_ds, 0.10)  # item-by-item
  print("Accuracy on train data = %0.4f" % acc_train)

  acc_test = accuracy_nn(net, test_ds, 0.10)  # all-at-once
  print("Accuracy on test data = %0.4f" % acc_test)

# -----------------------------------------------------------

  print("\nEnd GPR vs NN demo ")

if __name__ == "__main__":
  main()

Program to generate the synthetic data:

# make_synthetic.py
# create synthetic train and test datasets for regression
# PyTorch 2.0-CPU  Anaconda3-2022.10  Python 3.9.13
# Windows 10/11 

import numpy as np
import torch as T  # non-standard alias
device = T.device('cpu')  # apply to Tensor or Module

# -----------------------------------------------------------

class Net(T.nn.Module):
  def __init__(self, n_in):
    super(Net, self).__init__()
    self.hid1 = T.nn.Linear(n_in, 100)  # n-(100)-1
    self.oupt = T.nn.Linear(100, 1)
  
    lim = 0.80
    T.nn.init.uniform_(self.hid1.weight, -lim, lim) 
    T.nn.init.uniform_(self.hid1.bias, -lim, lim)
    T.nn.init.uniform_(self.oupt.weight, -lim, lim) 
    T.nn.init.uniform_(self.oupt.bias, -lim, lim)

  def forward(self, x):
    z = T.tanh(self.hid1(x))
    z = T.sigmoid(self.oupt(z))  # oupt in [0.0, 1.0]
    return z

# -----------------------------------------------------------

def create_data_file(net, n_in, fn, n_items):
  f = open(fn, "w")
  x_lo = -1.0; x_hi = 1.0
  for i in range(n_items):
    s = ""
    X = (x_hi - x_lo) * np.random.random(size=(1,n_in)) + x_lo
    for j in range(n_in):
      s += ("%0.4f" % X[0][j]) + "\t"
    X = T.tensor(X, dtype=T.float32)

    with T.no_grad():
      y = net(X).item()
    y += np.random.normal(loc=0.0, scale=0.01)  # noise
    if y "lt" 0.0: y = 0.01 * np.random.random() + 0.01  # pos
    s += ("%0.4f" % y) + "\n"

    f.write(s)
  f.close()

# -----------------------------------------------------------

def main():
  # 0. get started
  print("\nCreating synthetic datasets for regression ")
  np.random.seed(1)
  T.manual_seed(1) 

  # 1. create neural generator model
  n_in = 10
  print("\nCreating " + str(n_in) + "-(100)-1 regression NN ")
  net = Net(n_in).to(device)

  # 2. use model to create data
  print("\nCreating data files ")
  create_data_file(net, n_in, ".\\synthetic_train.txt", 1000)
  create_data_file(net, n_in, ".\\synthetic_test.txt", 200)

  print("\nEnd create synthetic data ")

# -----------------------------------------------------------

if __name__=="__main__":
  main()