"Simple k-NN Regression Using C#" in Visual Studio Magazine

I wrote an article titled “Simple k-NN Regression Using C#” in the November 2024 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.1105cms01.com/Articles/2024/11/15/Simple-k-NN-Regression-Using-Csharp.aspx.

The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict the price of a house based on its square footage, number of bedrooms, property tax rate, and so on. Common regression techniques include multiple linear regression, tree-based regression (decision tree, AdaBoost, random forest, bagging), neural network regression, and k-nearest neighbors (k-NN) regression.

The VSM article presents a demo of k-NN regression using the C# language. Compared to other regression techniques, k-NN regression is often slightly less accurate, but k-NN regression is very simple to implement and customize, and k-NN regression results are highly interpretable.

The demo program uses synthetic training and test data into memory. The data looks like:

-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
. . .

The first five values on each line are the x predictors. The last value on each line is the target y variable to predict. The demo creates a k-NN regression model, evaluates the model accuracy on the training and test data, and then uses the model to predict the target y value for a new, previously unseen data item. The demo output is:

Creating k-NN model, k=4
Accuracy train (within 0.15): 0.7950
Accuracy test (within 0.15): 0.7750
Predicting for: [0.7462, 0.4006, -0.0590, 0.6543, -0.0083]
y = 0.2137

The k-NN regression technique is very simple. To make a prediction for some input vector x, you examine the training / reference data and find the k-closest items to x. The value of k is often set to 3 or 5, and it must be determined by trial and error. The predicted y value is the average of the known y values of the close items in the training data.

In the early days of machine learning, several variations of k-NN regression were explored, for example, weighting the predicted value by the distances to the nearest neighbors. In most cases, the added complexity of these variations does not significantly improve k-NN regression and so the variations are not widely used today.

It’s unlikely that Earth has near neighbors with intelligent life. But that didn’t stop one of my all time favorite cartoonists, Gahan Wilson (1930-2019) from speculating.