I work at a very large tech company. Somewhat by luck, I’ve been working with machine learning for many years — even when it wasn’t popular in the 1990s. Now that ML/AI are among the hottest topics in tech, I’ve been giving a lot of training talks and workshops at my company to experienced programmers and engineers who want to get up to speed with ML/AI.
In my mind, data clustering using k-means is one of the six fundamental ML techniques all developers and engineers should know — meaning that they should understand what types of problems can, and cannot be solved using each technique, understand the strengths and weaknesses of each technique, and most importantly, be able to implement each technique from scratch (without using a code library) using a language like C#, Java, or Python. I believe this knowledge can be learned by any developer who has intermediate or better programming skill, in an intense one-day workshop.

Here is the k-means clustering demo that I explain in my fundamental machine learning techniques workshop.
One subjective idea about data clustering that seems to cause trouble for people who are new to ML is the fact that clustering is an exploratory process. There are several possible goals when performing clustering. One common goal is to cluster the source data, then visually examine the clustered data to see if any significant patterns emerge. Another goal is to cluster the source data then identify items within clusters that are far away from their cluster means, indicating they are anomalous in some way.
The k-means clustering algorithm is really more of a heuristic (set of general guidelines) than a well-defined step-by-step process. There are dozens of significant variations of the k-means technique. This is one reason why trying to use a canned code library for data clustering usually doesn’t work too well. Data clustering is usually most effective (well, in my experience anyway), when you have complete control over the clustering code and you customize the code to a specific problem.
I’m going to deliver the same content that I use at my company in a workshop at the upcoming Microsoft Azure + AI Conference. The event runs Nov. 17-22, 2019 in Las Vegas. See https://www.azureaiconf.com. My all-day workshop is titled “Practical Machine Learning Using C#” and is on Friday, Nov. 22, 2019. If you attend the conference, be sure to track me down and maybe we can chat about clustering or machine learning.

Las Vegas is quite an interesting place. Left: Tough choice at the Excalibur Hotel between getting married or eating next door at Pizza Hut. Center: A video poker machine where you play 100 hands at a time. Seems a bit extreme to me. Right: I saw this weird and creepy pulsating red heart over some slot machines at the Aria Hotel. Ugh. Strange.
.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.