I wrote an article titled “Kernel Ridge Regression with Cholesky Inverse Training Using JavaScript” in the January 2026 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2026/01/06/kernel-ridge-regression-with-cholesky-inverse-training-using-javascript.aspx.
There are approximately a dozen common regression techniques. Examples include linear regression, nearest neighbors regression, quadratic regression, decision tree regression (several types, such as random forest and gradient boost), and neural network regression.
Kernel ridge regression uses a kernel function that computes a measure of similarity between two data items, and a ridge regularization technique to discourage model overfitting. Model overfitting occurs when a model predicts well on the training data, but predicts poorly on new, previously unseen data. Ridge regularization is also called L2 regularization.
My article presents a demo of kernel ridge regression, implemented from scratch, using the JavaScript language. The output of the demo is:
Begin Kernel Ridge Regression with Cholesky matrix inverse Loading train (200) and test (40) from file First three train X: -0.1660 0.4406 -0.9998 -0.3953 -0.7065 0.0776 -0.1616 0.3704 -0.5911 0.7562 -0.9452 0.3409 -0.1654 0.1174 -0.7192 First three train y: 0.4840 0.1568 0.8054 Setting RBF gamma = 0.3 Setting alpha noise = 0.005 Creating and training KRR model using Cholesky inverse Done Model weights: -2.0218 -1.1406 0.0758 -0.6265 0.5722 -0.9905 0.6912 0.4807 0.6496 -0.7364 . . . -0.2014 -1.6270 -0.5825 -0.0487 1.2897 Computing model accuracy Train acc (within 0.10) = 0.9950 Test acc (within 0.10) = 0.9500 Train MSE = 0.0000 Test MSE = 0.0002 Predicting for x = -0.1660 0.4406 -0.9998 -0.3953 -0.7065 Predicted y = 0.4941 End demo
The demo program uses the radial basis function (RBF) kernel function to measure the similarity between two data items. RBF values range for 1.0 (identical items) down to 0.0 (increasing dissimilarity).
There are two main ways to train a kernel ridge regression model. The first technique, and the one used by the demo program presented in this article, involves creating an n-by-n kernel matrix that compares all the training data items with each other. Then ridge regularization is applied by adding a small constant, usually named alpha, to the diagonal elements of the kernel matrix. Then the matrix inverse of the kernel matrix is computed. The inverse of the kernel matrix is multiplied by the vector of training y values, which gives the model weights.
The second technique to train a kernel ridge regression model uses stochastic gradient descent (SGD). SGD is an iterative process that loops through the training data multiple times, adjusting the model weights slowly so that the model reduces its error between predicted y values and target y values.
Computing a matrix inverse is one of the most challenging problems in numerical programming. There are over a dozen algorithms to compute a matrix inverse, and each algorithm has several variations, and each variation has multiple implementation designs.
As it turns out, because the kernel matrix for kernel ridge regression has all positive values and is symmetric, it’s possible to use a specialized matrix inverse algorithm called Cholesky decomposition. Cholesky decomposition inverse is simpler than general-purpose inverse algorithms.
My article concludes with a recap:
* Kernel ridge regression (KRR) is a machine learning technique to predict a numeric value.
* Kernel ridge regression requires a kernel function that computes a measure of similarity between two training items.
* The most common kernel function is the radial basis function (RBF).
* There are two ways to train a KRR model, kernel matrix inverse and stochastic gradient descent (SGD).
* Both training techniques require an alpha constant for ridge (aka L2) regularization to discourage model overfitting.
* For KRR matrix inverse training, you must compute the inverse of a kernel matrix of RBF applied to all pairs of training items.
* There are many techniques to compute a matrix inverse. Cholesky decomposition is a specialized, relatively simple technique that can be used for kernel matrices.

Inexplicably, one of my favorite comedy movie bits is the “bat in the house” bit. There are a surprisingly large number of movies that feature this idea.
Left: The first example that I’m aware of is “The Laurel-Hardy Murder Case” (1930). The boys are in a creepy old mansion. A bat gets in the house and under a bed sheet. The boys think it’s a ghost as it chases them around the house. Very funny movie.
Right: Another old example is “Spooks” (1953) featuring The Three Stooges. The boys are in a haunted house. A hilariously fake bat chases them around. A funny movie.


.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2025 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2025 G2E Conference
2025 iSC West Conference
You must be logged in to post a comment.