The k-NN Classification Technique is Simple and Complicated

The k-NN (“k-nearest neighbors”) classification technique is very simple . . . and very complicated. By that I mean k-NN is conceptually very simple, but actually implementing k-NN is surprisingly complicated.

Suppose you have some data that represents people’s age, income, debt, and a class (0, 1, 2) that indicates how likely they are to buy something from your company. Let’s say you have 100 such data points. You want to use age, income, debt to predict the class. First you set a value for k. Suppose you set k = 4. Next you specify a source item to predict, say (0.39, 0.534, 0.152). Then you compute the mathematical distance from the source item to each of your 100 data points, and find the k = 4 nearest such points, and order by closest distance. Suppose the four closest points are:

idx  age   income  debt  distance  class
========================================
[26] 0.39  0.539  0.151  0.0051      1
[13] 0.40  0.531  0.157  0.0116      0
[99] 0.38  0.561  0.149  0.0289      2
[57] 0.36  0.540  0.149  0.0307      1

With this information, using k = 4, it’s clear that the predicted class should be c = 1 because two out of the four closest points (a majority rule voting scheme), and the closet point, have class 1. There are several other voting schemes you can use.

In short, the main idea behind k-NN is to find the closest neighbors to the item you want to classify then see what those closest neighbors are like.

Somewhat surprisingly, implementing k-NN classification using a programming language like C# or Python, is much trickier than I expected. There wasn’t one major hurdle; there were lots of minorly tricky details.

A demo of k-NN classification using the C# language.

The image above shows an example of k-NN classification using raw C# (without some code library). It uses a voting technique called inverse weights.

I’m going to deliver a one-day, hands-on workshop at the upcoming Microsoft Azure + AI Conference. The event runs Nov. 17-22, 2019 in Las Vegas. See https://www.azureaiconf.com. My all-day workshop is titled “Practical Machine Learning Using C#” and is on Friday, Nov. 22, 2019. One of the six techniques the workshop will cover is k-NN classification. If you attend the conference, be sure to track me down and maybe we can be nearest neighbors at a bar after the workshop.

Three more or less random images from an Internet image search for paintings of nearest neighbors at a bar.