“Kernel Ridge Regression with Cholesky Inverse Training Using C#” in Visual Studio Magazine

I wrote an article titled “Kernel Ridge Regression with Cholesky Inverse Training Using C#” in the September 2025 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2025/09/15/kernel-ridge-regression-with-cholesky-inverse-training-using-csharp.aspx.

The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict a person’s bank savings account balance based on their age, years of work experience, and so on.

There are approximately a dozen common regression techniques. Examples include linear regression, k-nearest neighbors regression, decision tree regression (several types, such as random forest), and neural network regression. Each technique has pros and cons. A technique that often produces accurate predictions for complex data is called kernel ridge regression. Note: “kernel ridge regression” is very different from the similarly named “ridge regression.”

My article presents a demo of kernel ridge regression (KRR), implemented from scratch, using the C# language. There are two main ways to train a KRR model. One way is to use stochastic gradient descent (SGD) to iteratively update the model weights. The second way is to compute a kernel matrix and use its inverse to solve for the model weights. My article uses the second approach.

The article has a lot of information. The summary is:

* Kernel ridge regression (KRR) is a machine learning technique to predict a numeric value.

* Kernel ridge regression requires a kernel function that computes a measure of similarity between two training items.

* The most common kernel function is the radial basis function (RBF).

* There are two forms of the RBF function, the gamma and the sigma.

* There are two ways to train a KRR model, kernel matrix inverse and stochastic gradient descent (SGD).

* Both training techniques require an alpha constant for ridge (aka L2) regularization to deter model overfitting.

* For KRR matrix inverse training, you must compute the inverse of a kernel matrix of RBF applied to all pairs of training items.

* For KRR matrix inverse training, alpha is added to the diagonal elements of the kernel matrix, which prevents model overfitting and also conditions the matrix so that computing the inverse is less likely to fail.

* There are many techniques to compute a matrix inverse. Cholesky decomposition is a specialized, relatively simple technique that can be used for kernel matrices.

* The matrix inverse training technique often works well for small and medium size datasets, but it is complex and can fail.

* The SGD training technique can be used with any size dataset, but it requires a learning rate and a maximum epochs, which must be determined by trial and error.



Machine learning starts with data items, often called data points. Each data item/point is relatively insignificant, but together the items/points constitute a statistical population or sample. Here are three memorable (to me anyway) science fiction covers that illustrate an unfortunate population of humans.

Left: “The Well of the Worlds” (1965) by author Henry Kuttner. Art by Alex Schomburg.

Center: “The Atlantic Abomonation” (1968) by author John Brunner. Art by Ed Emshwiller. The cover shows a huge alien with mind control. Hundreds of humans carry the alien, and when faced with a canyon, the alien simply had thousands of human become landfill. Scary to me when I read it as a young man.

Right: This is the January 1953 cover of “If: Worlds of Science Fiction Magazine”. The art is by Anton Kurka but it doesn’t correspond to any of the nine stories in the issue. The table of contents notes that the cover illustration suggests “The Ultimate Re-Sowing of the Human Race — 4000 AD” but that story does not exist to the best of my knowledge.


This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply