One of the most fundamental machine learning classification techniques is k-nearest neighbors (k-NN). If you have a set of labeled data points, and you want to classify a new unlabeled data item, you compute the distance of the unlabeled item to all of the labeled points, then examine the closest k labeled data points, and find the most common label.
Sort of.
The details are a bit tricky and are best explained by a concrete example. I set up 30 dummy data points that look like:
[ 0, 0.32, 0.43, 0] [ 1, 0.26, 0.54, 0] . . . [12, 0.39, 0.43, 1] . . . [29, 0.71, 0.22, 2]
Each value represents a person. The first value is an ID, which is synonymous with the zero-based index. The second value is the person’s income (normalized). The third value is the person’s education level (normalized). The fourth value is the class label to predict, which can be 0, 1, or 2. The class labels are arbitrary but you can imagine that they are a measure of the person’s happiness.
The item to predict is set as [0.62, 0.35]. If you look at the graph of the data points (colored red, blue, green) and the item-to-predict (colored yellow), and you set k = 3, the 3 closest points to the yellow item are red, red, green, so if you use a majority voting rule, you’d predict the unlabeled item is class red = 0.
I set k = 6 in my demo code. The problem with a simple majority voting rule is that it doesn’t take into account how close the k points are. A better approach is to give more weight to points that are close to the data item that is being predicted. For the demo data, the k = 6 closest data points and their distances are:
(0.61 0.42) class = 0 dist = 0.0707 1/dist = 14.1421 (0.55 0.32) class = 0 dist = 0.0762 1/dist = 13.1306 (0.55 0.41) class = 1 dist = 0.0922 1/dist = 10.8465 (0.64 0.24) class = 2 dist = 0.1118 1/dist = 8.9443 (0.49 0.32) class = 0 dist = 0.1334 1/dist = 7.4953 (0.71 0.22) class = 2 dist = 0.1581 1/dist = 6.3246
The inverse distances are used so that a smaller distance gives a larger voting weight. The sum of the inverse distances is 14.1421 + . . + 6.3246 = 60.8834. The voting weights are the inverse distances divided by their sum:
class = 0 weight = 14.1421 / 60.8834 = 0.2323 class = 0 weight = 13.1306 / 60.8834 = 0.2157 class = 1 weight = 10.8465 / 60.8834 = 0.1782 class = 2 weight = 8.9443 / 60.8834 = 0.1469 class = 0 weight = 7.4953 / 60.8834 = 0.1231 class = 2 weight = 6.3246 / 60.8834 = 0.1039
The voting for each class is:
class 0 : 0.2323 + 0.2157 + 0.1231 = 0.5711 class 1 : = 0.1782 class 2 : 0.1469 + 0.1039 = 0.2508
Because class 0 has the most voting weight, the prediction is that the unlabeled data item is class 0.
The main advantage of k-NN classification is simplicity. The disadvantages include: the technique works only for strictly numeric data (that must be normalized), there’s no good way to select k, and different values of k can give very different predictions.

Where is planet Earth’s nearest inhabited neighbor? Several scientists have pointed out that broadcasting messages to outer space has slight but serious risks. When high-tech civilizations meet lower-tech civilizations, the results are usually not good for the lower-tech civilization — that would be Earth.
Left: In “Kronos” (1957), aliens send a giant energy harvesting machine to Earth. Center: In “Invaders from Mars” (1953), a Martian intelligence comes to Earth with plans of conquest. Right: In “The Crawling Eye” (1958), aliens land in the Swiss Alps and begin terra-forming Earth to suit their needs as a prelude to conquest.
Demo code:
# weighted_knn.py
# k-nearest neighbors demo
# Anaconda3-2020.02 (Python 3.7.6)
import numpy as np
def dist_func(item, data_point):
sum = 0.0
for i in range(2):
diff = item[i] - data_point[i+1] # no ID
sum += diff * diff
return np.sqrt(sum)
def make_weights(k, distances):
result = np.zeros(k, dtype=np.float32)
sum = 0.0
for i in range(k):
result[i] += 1.0 / distances[i]
sum += result[i]
result /= sum
return result
def show(v):
print("idx = %3d (%3.2f %3.2f) class = %2d " \
% (v[0], v[1], v[2], v[3]), end="")
def main():
print("\nBegin weighted k-NN demo \n")
print("Normalized income-education data looks like: ")
print("[id = 0, 0.32, 0.43, class = 0]")
print(" . . . ")
print("[id = 29, 0.71, 0.22, class = 2]")
# ID/idx income education class (arbitrary)
data = np.array([
[0, 0.32, 0.43, 0], [1, 0.26, 0.54, 0],
[2, 0.27, 0.60, 0], [3, 0.37, 0.36, 0],
[4, 0.37, 0.68, 0], [5, 0.49, 0.32, 0],
[6, 0.46, 0.70, 0], [7, 0.55, 0.32, 0],
[8, 0.57, 0.71, 0], [9, 0.61, 0.42, 0],
[10, 0.63, 0.51, 0], [11, 0.62, 0.63, 0],
[12, 0.39, 0.43, 1], [13, 0.35, 0.51, 1],
[14, 0.39, 0.63, 1], [15, 0.47, 0.40, 1],
[16, 0.48, 0.50, 1], [17, 0.45, 0.61, 1],
[18, 0.55, 0.41, 1], [19, 0.57, 0.53, 1],
[20, 0.56, 0.62, 1], [21, 0.28, 0.12, 1],
[22, 0.31, 0.24, 1], [23, 0.22, 0.30, 1],
[24, 0.38, 0.14, 1], [25, 0.58, 0.13, 2],
[26, 0.57, 0.19, 2], [27, 0.66, 0.14, 2],
[28, 0.64, 0.24, 2], [29, 0.71, 0.22, 2]],
dtype=np.float32)
item = np.array([0.62, 0.35], dtype=np.float32)
print("\nNearest neighbors (k=6) to (0.62, 0.35): \n")
# 1. compute all distances to item
N = len(data)
k = 6
c = 3
distances = np.zeros(N)
for i in range(N):
distances[i] = dist_func(item, data[i])
# 2. get ordering of distances
ordering = distances.argsort()
# 3. get and show info for k nearest
k_near_dists = np.zeros(k, dtype=np.float32)
sum_inv_dists = 0.0
for i in range(k):
idx = ordering[i]
show(data[idx]) # pretty formatting
print("distance = %0.4f " % distances[idx], end="")
k_near_dists[i] = distances[idx] # save dists
print(" 1/dist = %8.4f " % (1.0/distances[idx]))
sum_inv_dists += 1.0/distances[idx]
print("\nsum inv distances = %9.4f " % sum_inv_dists)
# 4. vote
votes = np.zeros(c, dtype=np.float32)
wts = make_weights(k, k_near_dists)
print("\nVoting weights (inverse distance technique): ")
for i in range(len(wts)):
print("%7.4f" % wts[i], end="")
print("\n\nPredicted class: ")
for i in range(k):
idx = ordering[i]
pred_class = np.int64(data[idx][3])
votes[pred_class] += wts[i] * 1.0
for i in range(c):
print("[%d] %0.4f" % (i, votes[i]))
print("\nEnd weighted k-NN demo ")
if __name__ == "__main__":
main()

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.