Weighted k-NN Classification Using C#

I wrote an article titled “Weighted k-NN Classification Using C#” in the May 2019 issue of Microsoft MSDN Magazine. See https://msdn.microsoft.com/en-us/magazine/mt833446.

The k-NN (k-nearest neighbors) algorithm is very simple but there are several tricky implementation details. In the article I use an example where the goal is to predict a person’s political leaning (conservative, moderate, liberal) based on their age and annual medical expenses. You start with some training data that has known, correct political leaning. Then you set k to some number, for example 5. Then, to predict the political leaning for a new, previously unseen data item, you find the k closet items in the training data.

Suppose three of the k = 5 closest training items are “conservative”, one item is “moderate”, and one item is “liberal”. Then you use a voting scheme to determine the prediction. If you use simple majority rule (every close training item has an equally weighted vote), then the prediction for the new item is “conservative” because it has 3 votes compared to 1 vote each for “moderate” and “liberal”.

An implementation must deal with the following issues. How do you measure “closest”? (You normalize data and use Euclidean distance). How do you vote so that training items that are close to the new item-to-classify get more weight than training items that are farther away? (You use a technique called inverse distance weighting). How to you pick a value for k? (You use trial and error and intuition).

The weighted k-NN classification algorithm has received increased attention recently for two reasons. First, by using neural auto­encoding, k-NN can deal with mixed numeric and non-numeric predictor values. Second, compared to many other classification algorithms, the results of weighted k-NN are relatively easy to interpret. Interpretability has become increasingly important due to legal requirements of ML techniques from regulations such as the European Union’s GDPR.



Bad Neighbors. The third image from left only makes sense if you seen the wonderful animated film “My Neighbor Totoro” (1993).

This entry was posted in Machine Learning. Bookmark the permalink.