Weighted k-NN Classification Using C# in Visual Studio Magazine

I wrote an article titled “Weighted k-NN Classification Using C#” in the May 2022 edition of Microsoft Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2022/05/19/weighted-k-nn-classification.aspx.

I walk through an example with a set of 30 data items where each item consists of a person’s annual income, their education (years) and their personality happiness score (0, 1 or 2). You want to predict a person’s happiness score from their income and education.

The (synthetic) data looks like:

ID   Age    Income    Happy
00   0.32   0.43       0
01   0.26   0.54       0
. . .
29   0.71   0.22       2

The goal of the article demo is to predict the happiness score of a person who has (income, education) of (0.62, 0.35).

The demo program specifies k = 6 and then computes the six closest data points to the (0.62, 0.35) item to classify. Of the six closest points, three are class 0, one is class 1, and two are class 2. A simple majority-rule voting scheme would conclude that (0.62, 0.35) is class 0. The demo uses a weighted voting scheme where closer data points receive more weight. The weighted votes are (0.5711, 0.1782, 0.2508). The largest value is at index [0] and so the prediction is happiness = 0.

After distances from the item-to-be-classified to all data points have been computed, the k-NN algorithm must find the k-nearest (smallest) distances. The approach used by the demo is to use a neat overload of the .NET Array.Sort method to sort just the distances and the associated data indices in parallel. The key code is:

int[] ordering = new int[N];
for (int i = 0; i < N; ++i)
  ordering[i] = i;
double[] distancesCopy = new double[N];
Array.Copy(distances, distancesCopy, distances.Length);
Array.Sort(distancesCopy, ordering);

The result is that the first value in the array “ordering” is the index in data of the item with the smallest distance, the second value in ordering holds the index to the second closest distance, and so on. A very nice feature of the C# language.

The two main advantages of weighted k-NN classification are 1.) it’s simple to understand and implement, and 2.) the results are easily interpretable. The two main disadvantages of weighted k-NN classification are 1.) there is no principled way to choose the value of k, and 2.) the technique works well only with strictly numeric predictor values because it’s usually not possible to define a good distance function for non-numeric data points.



The nearest neighbor to Earth is Venus (105 million miles). Edgar Rice Burroughs is best known for his Tarzan novels and his Mars / John Carter novels. Burroughs also wrote five novels about Carson Napier who ends up on Venus. I like the Venus series but not as much as the books of the Mars series. Here are covers of the first three Venus series books, by three different artists. Left: “Pirates of Venus” (first published 1932) by artist Roy Krenkel. Center: “Lost on Venus” (1933) by artist Frank Frazetta. Right: “Carson of Venus” (1938) by artist John Coleman Burroughs, the author’s son.


This entry was posted in Machine Learning. Bookmark the permalink.