"Random Neighborhoods Regression Using C#" in Visual Studio Magazine

I wrote an article titled “Random Neighborhoods Regression Using C#” in the February 2025 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/Articles/2025/02/03/Random-Neighborhoods-Regression.aspx.

In regular k-nearest neighbors regression, to predict the target value y for an input vector x, you find the k-nearest training items to x and then calculate and return the average of the y values associated with the closest items.

In random neighborhoods regression, you create an ensemble (collection) of several k-nearest neighbor regressor systems, each using a different subset of the source training data, and each with a different value of k. The predicted y value is the average of the predictions made by the collection of k-nearest neighbor systems.

The article presents a complete end-to-end demo of random neighborhoods regression. The demo data looks like:

-0.1660,  0.4406, -0.9998, -0.3953, -0.7065,  0.4840
 0.0776, -0.1616,  0.3704, -0.5911,  0.7562,  0.1568
-0.9452,  0.3409, -0.1654,  0.1174, -0.7192,  0.8054
 0.9365, -0.3732,  0.3846,  0.7528,  0.7892,  0.1345
. . .

The first five values on each line are the x predictors. The last value on each line is the target y value to predict.

The key parts of the demo program output are:

Creating and training random neighborhoods regression model
Setting numNeighborhoods = 6
Setting pctData = 0.90
Setting minK = 2
Setting maxK = 7
Done

Evaluating model
Accuracy train (within 0.15) = 0.8100
Accuracy test (within 0.15) = 0.7500

Predicting for x =
  -0.1660   0.4406  -0.9998  -0.3953  -0.7065
neighborhood [ 0]  k = 3 :  pred y = 0.5287
neighborhood [ 1]  k = 5 :  pred y = 0.5972
neighborhood [ 2]  k = 2 :  pred y = 0.5193
neighborhood [ 3]  k = 5 :  pred y = 0.5972	
neighborhood [ 4]  k = 5 :  pred y = 0.5823
neighborhood [ 5]  k = 5 :  pred y = 0.5972
Predicted y = 0.5703

The pctData parameter value of 0.90 means each k-NN regressor will use a randomly selected 90% of the 200 training items. The training items are selected “with replacement,” which means that you can, and will almost certainly, get duplicate training items in each k-NN regressor. Duplicate training items help prevent model overfitting.

The values of the minK = 2 and maxK = 7 parameters mean each k-NN regressor will use between 2 and 7 nearest neighbors when computing its prediction. The values of 2 and 7 have worked well for me across a large number of different regression problem scenarios.

In the early days of machine learning, random neighborhoods regression was used quite often, at least by me and many of my colleagues. But random neighborhoods regression is not used nearly as often as it used to be. I’m not sure exactly why the use of random neighborhoods regression has declined, but the technique often works well in a wide range of problem scenarios and can make a nice addition to your personal machine learning toolkit.

I’ve always been fascinated by old electro-mechanical games. The very first pinball game to use bumpers instead of pins (think thin nails) was the “Bumper” game. It was introduced in 1936 by Bally Manufacturing and instantly made pin designs obsolete.

The bumpers were not like those used today. The bumpers were primitive coil-springs that really didn’t cause a pinball to rebound very much. The end of the spring extended below the playing field and allowed hits on the bumper to score points.

Notice there are no flippers. Players could only influence the pinball by gently shaking the entire game. Flippers were not introduced until 1947. Active, electrical-powered “pop” bumpers that caused a pinball to really ricochet, were introduced in 1948.