I wrote an article titled, “Data Clustering with K-Means Using Python” in the March 2018 issue of Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2018/03/27/clustering-with-k-means-using-python.aspx.
The idea of clustering is pretty simple: take a dataset then group items together so that similar item are in the same group/cluster (and therefore dissimilar items are in different groups/clusters). After clustering, the results can be examined to see if any interesting patterns emerge, or you can identify outliers — a form of anomaly detection. But as always, the details are quite tricky.
There are several clustering algorithms. In my article, I explained how to implement one of the most common, which is called the k-means technique (or sometimes referred to as Lloyd’s algorithm). However, k-means is really more of a heuristic than a detailed algorithm, meaning that there are many different specific approaches you can use.
Many of the different k-means approaches involve the initialization phase. As it turns out, getting the k-means algorithm started well is very important. This is because clustering is an NP-complete problem which means that it’s no practical to get an optimal clustering (because you’d have to try every possible clustering). In fact, one variant of k-means is called k-means++ and it uses a pretty complicated initiation routine.
Anyway, I show exactly how to implement one possible variation of k-means clustering, using the Python language. The idea of a custom implementation is that it gives you total control over the many different options you can apply.
The biggest downside to k-means clustering is that the technique can be used only with data that is all numeric. There are techniques for clustering non-numeric or mixed numeric and non-numeric data, but they are very difficult.

“Liverpool from Wapping” (1875), John Grimshaw. Ships, people, buildings in three different clusters.


.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.