"Kernel Ridge Regression with Cholesky Inverse Training Using JavaScript" in Visual Studio Magazine

I wrote an article titled “Kernel Ridge Regression with Cholesky Inverse Training Using JavaScript” in the January 2026 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2026/01/06/kernel-ridge-regression-with-cholesky-inverse-training-using-javascript.aspx.

There are approximately a dozen common regression techniques. Examples include linear regression, nearest neighbors regression, quadratic regression, decision tree regression (several types, such as random forest and gradient boost), and neural network regression.

Kernel ridge regression uses a kernel function that computes a measure of similarity between two data items, and a ridge regularization technique to discourage model overfitting. Model overfitting occurs when a model predicts well on the training data, but predicts poorly on new, previously unseen data. Ridge regularization is also called L2 regularization.

My article presents a demo of kernel ridge regression, implemented from scratch, using the JavaScript language. The output of the demo is:

Begin Kernel Ridge Regression with Cholesky matrix inverse

Loading train (200) and test (40) from file

First three train X:
 -0.1660   0.4406  -0.9998  -0.3953  -0.7065
  0.0776  -0.1616   0.3704  -0.5911   0.7562
 -0.9452   0.3409  -0.1654   0.1174  -0.7192

First three train y:
   0.4840
   0.1568
   0.8054

Setting RBF gamma = 0.3
Setting alpha noise = 0.005

Creating and training KRR model using Cholesky inverse
Done

Model weights:
  -2.0218   -1.1406    0.0758   -0.6265    0.5722
  -0.9905    0.6912    0.4807    0.6496   -0.7364 
. . .
  -0.2014   -1.6270   -0.5825   -0.0487    1.2897

Computing model accuracy

Train acc (within 0.10) = 0.9950
Test acc (within 0.10) = 0.9500

Train MSE = 0.0000
Test MSE = 0.0002

Predicting for x =
  -0.1660    0.4406   -0.9998   -0.3953   -0.7065
Predicted y = 0.4941

End demo

The demo program uses the radial basis function (RBF) kernel function to measure the similarity between two data items. RBF values range for 1.0 (identical items) down to 0.0 (increasing dissimilarity).

There are two main ways to train a kernel ridge regression model. The first technique, and the one used by the demo program presented in this article, involves creating an n-by-n kernel matrix that compares all the training data items with each other. Then ridge regularization is applied by adding a small constant, usually named alpha, to the diagonal elements of the kernel matrix. Then the matrix inverse of the kernel matrix is computed. The inverse of the kernel matrix is multiplied by the vector of training y values, which gives the model weights.

The second technique to train a kernel ridge regression model uses stochastic gradient descent (SGD). SGD is an iterative process that loops through the training data multiple times, adjusting the model weights slowly so that the model reduces its error between predicted y values and target y values.

Computing a matrix inverse is one of the most challenging problems in numerical programming. There are over a dozen algorithms to compute a matrix inverse, and each algorithm has several variations, and each variation has multiple implementation designs.

As it turns out, because the kernel matrix for kernel ridge regression has all positive values and is symmetric, it’s possible to use a specialized matrix inverse algorithm called Cholesky decomposition. Cholesky decomposition inverse is simpler than general-purpose inverse algorithms.

My article concludes with a recap:

* Kernel ridge regression (KRR) is a machine learning technique to predict a numeric value.

* Kernel ridge regression requires a kernel function that computes a measure of similarity between two training items.

* The most common kernel function is the radial basis function (RBF).

* There are two ways to train a KRR model, kernel matrix inverse and stochastic gradient descent (SGD).

* Both training techniques require an alpha constant for ridge (aka L2) regularization to discourage model overfitting.

* For KRR matrix inverse training, you must compute the inverse of a kernel matrix of RBF applied to all pairs of training items.

* There are many techniques to compute a matrix inverse. Cholesky decomposition is a specialized, relatively simple technique that can be used for kernel matrices.

Inexplicably, one of my favorite comedy movie bits is the “bat in the house” bit. There are a surprisingly large number of movies that feature this idea.

Left: The first example that I’m aware of is “The Laurel-Hardy Murder Case” (1930). The boys are in a creepy old mansion. A bat gets in the house and under a bed sheet. The boys think it’s a ghost as it chases them around the house. Very funny movie.

Right: Another old example is “Spooks” (1953) featuring The Three Stooges. The boys are in a haunted house. A hilariously fake bat chases them around. A funny movie.