Data clustering is a classical machine learning technique to group similar data items together. By far the most common clustering technique is k-means clustering. Three other more-or-less common clustering techniques include DBSCAN (“density based spatial clustering of applications with noise”), Gaussian mixture model clustering, and Spectral clustering.
A fifth clustering technique is self-organizing map (SOM) clustering. I put together an implementation of SOM clustering, from scratch, using the C# language.
Compared to other techniques, SOM clustering produces clustering where the clusters are constructed so that clusters with similar IDs are closer to each other than clusters with dissimilar cluster IDs. For example, suppose you set k = 5 and so each data item falls into one of clusters 0, 1, 2, 3, 4. Data items in clusters 0 and 1 are relatively closer to each other than data items in clusters 0 and 3.
Self-organizing maps can be used for several purposes other than clustering. A standard, non-clustering SOM creates a square matrix, such as 4-by-4, called the map. Each cell of the SOM map holds a vector that represents a cluster. For a clustering SOM, even though you could create a square matrix map, it makes more sense to create a 1-by-k matrix/vector map.
For my demo program, I used a small 12-item subset of the Penguin dataset. The data is:
0, 39.5, 17.4, 186, 3800 0, 40.3, 18.0, 195, 3250 0, 36.7, 19.3, 193, 3450 0, 38.9, 17.8, 181, 3625 1, 46.5, 17.9, 192, 3500 1, 45.4, 18.7, 188, 3525 1, 45.2, 17.8, 198, 3950 1, 46.1, 18.2, 178, 3250 2, 46.1, 13.2, 211, 4500 2, 48.7, 14.1, 210, 4450 2, 46.5, 13.5, 210, 4550 2, 45.4, 14.6, 211, 4800
Each line represents a penguin. The columns are species (0 = Adelie, 1 = Chinstrap, 2 = Gentoo), bill length, bill depth, flipper length, body mass. The full Penguin dataset has 345 items, with 333 items that have no missing values.
The demo does not use the species labels, but notice that the data is artificially organized so that it’s already clustered.
The key calling statements in my demo are:
// 1. load data
string fn = "..\\..\\..\\Data\\penguin_12.txt";
double[][] X = ClusterSOM.MatLoad(fn,
new int[] { 1, 2, 3, 4 }, ',', "#"); // no labels
// 2. standardize data
double[][] stdX = ClusterSOM.MatStandard(X);
// 3. create ClusterSOM object and cluster
int k = 3; // number clusters
double lrnRateMax = 0.50;
int stepsMax = 1000;
ClusterSOM som = new ClusterSOM(stdX, k, seed: 0);
som.Cluster(lrnRateMax, stepsMax);
// 4. get and display clustering result
int[] clustering = som.GetClustering();
ClusterSOM.VecShow(clustering, wid: 3);
The clustering result is:
2 2 2 2 1 1 1 1 0 0 0 0
Data items in clusters 2 and 1 are close. Data items in clusters 1 and 0 are close. Data items in clusters 2 and 0 are relatively farther apart.
SOM clustering is an iterative process and requires a maximum learning rate (the learning rate is decreased during SOM map construction) and a maximum number of iteration steps.
The source data should be normalized/standardized so that columns with large magnitudes, like body mass, don’t overwhelm columns with small magnitudes, like bill length. The standardized data looks like:
-1.1657 0.3300 -0.8774 -0.1661 -0.9475 0.6163 -0.0943 -1.2103 -1.9291 1.2366 -0.2683 -0.8306 . . .
The penguin species labels are not used.
The demo 1-by-3 SOM map has three cells, one for each cluster. Each cell has a representative vector that you can think of as a central vector of the data items assigned to the cluster. For the demo data, the constructed SOM map is:
k = 0: 0.8296 -1.4315 1.2489 1.2322 k = 1: 0.5188 0.7038 -0.4929 -0.5406 k = 2: -1.3109 0.6217 -0.6869 -0.6074
Each data item is assigned to the cluster that has the closest map node:
k = 0: 8 9 10 11 k = 1: 4 5 6 7 k = 2: 0 1 2 3
This means data items 8, 9, 10, 11 are assigned to cluster 0 because those items are closest to the map node 0 vector: (0.8296 -1.4315 1.2489 1.2322).
The demo program iterates 1000 steps. This was determined by preliminary trial and error when the creation of the SOM map was monitored by computing the sum of the Euclidean distances (SED) within each cluster. When the SED stopped decreasing significantly at about step 1000, that’s a sign that the map build steps can stop.
The demo program computes the distance between each map node vector:
[0] 0.00 3.29 3.99 [1] 3.29 0.00 1.84 [2] 3.99 1.84 0.00
The largest distance is between cluster 0 and cluster 2 (3.99), as expected.
I don’t use SOM clustering very often. When I do use it, I typically use it as a check against k-means or DBSCAN clustering.

Most data clustering techniques are unsupervised / automatic. I’ve always been fascinated by old, coin operated mechanical automata. You’d drop a penny or a nickel into the machine and watch the action. Many early coin operated automata were rather morbid. Left: In the “St. Dennistoun Mortuary” automaton, circa 1900, the doors of the mortuary open to reveal bodies. The doctors would examine the bodies while the two women outside dabbed their tears with a handkerchief. Right: In “American Execution”, circa 1920, the condemned man in the electric chair has a metal cap lowered onto his head, the switch is thrown, and the condemned man’s head lights up as he’s electrocuted.
Demo program. Replace “lt” (less-than), “gt”, “lte”, “gte”, “and” with Boolean operator symbols.
using System;
using System.Collections.Generic;
using System.IO;
namespace ClusterSOM
{
class ClusterSOMProgram
{
static void Main(string[] args)
{
Console.WriteLine("\nBegin self-organizing" +
" map (SOM) clustering using C#");
// 1. load data
Console.WriteLine("\nLoading 12-item Penguin subset ");
string fn = "..\\..\\..\\Data\\penguin_12.txt";
double[][] X = ClusterSOM.MatLoad(fn,
new int[] { 1, 2, 3, 4 }, ',', "#");
Console.WriteLine("\nX: ");
ClusterSOM.MatShow(X, 1, 8, true);
//int[] y = ClusterSOM.VecLoad(fn, 0, "#");
//Console.WriteLine("\ny labels: ");
//ClusterSOM.VecShow(y, 3);
// 2. standardize data
double[][] stdX = ClusterSOM.MatStandard(X);
Console.WriteLine("\nStandardized data: ");
ClusterSOM.MatShow(stdX, 4, 9, true);
// 3. create ClusterSOM object and cluster
int k = 3;
double lrnRateMax = 0.50;
int stepsMax = 1000;
Console.WriteLine("\nSetting num clusters k = " + k);
Console.WriteLine("Setting lrnRateMax = " +
lrnRateMax.ToString("F2"));
Console.WriteLine("Setting stepsMax = " + stepsMax);
Console.WriteLine("\nComputing SOM clustering ");
ClusterSOM som = new ClusterSOM(stdX, k, seed: 0);
som.Cluster(lrnRateMax, stepsMax);
Console.WriteLine("Done ");
// 4. show the SOM
Console.WriteLine("\nSOM map nodes: ");
for (int kk = 0; kk "lt" k; ++kk)
{
Console.Write("k = " + kk + ": ");
ClusterSOM.VecShow(som.map[0][kk], 4, 9);
}
Console.WriteLine("\nSOM mapping: ");
for (int kk = 0; kk "lt" k; ++kk)
{
Console.Write("k = " + kk + ": ");
ClusterSOM.ListShow(som.mapping[0][kk]);
}
double[][] betweenDists = som.GetBetweenNodeDists();
Console.WriteLine("\nBetween map node distances: ");
ClusterSOM.MatShow(betweenDists, 2, 6, true);
// 5. show clustering result
Console.WriteLine("\nclustering: ");
int[] clustering = som.GetClustering();
ClusterSOM.VecShow(clustering, wid: 3);
Console.WriteLine("\nEnd SOM clustering demo");
Console.ReadLine();
} // Main
// ------------------------------------------------------
} // Program
public class ClusterSOM
{
public int k; // number clusters
public double[][] data; // data to cluster
public double[][][] map; // [r][c][vec]
public List"lt"int"gt"[][] mapping; // [r][c](List indices)
public Random rnd; // to initialize map cells
// ------------------------------------------------------
public ClusterSOM(double[][] X, int k, int seed)
{
int nRows = 1; // for map
int nCols = k; // for map
this.k = k;
//this.dim = X[0].Length;
int dim = X[0].Length;
this.rnd = new Random(seed);
this.data = X;
//this.n = X.Length;
// map is 1-by-k matrix
this.map =
new double[nRows][][]; // [r][c][vec]
for (int i = 0; i "lt" nRows; ++i)
{
this.map[i] = new double[nCols][];
for (int j = 0; j "lt" nCols; ++j)
{
this.map[i][j] = new double[dim];
for (int d = 0; d "lt" dim; ++d)
this.map[i][j][d] = this.rnd.NextDouble();
}
}
this.mapping =
new List"lt"int"gt"[nRows][]; // [r][c][lst]
for (int i = 0; i "lt" nRows; ++i)
{
this.mapping[i] = new List"lt"int"gt"[nCols];
for (int j = 0; j "lt" nCols; ++j)
this.mapping[i][j] = new List"lt"int"gt"();
}
} // ctor
// ------------------------------------------------------
public void Cluster(double lrnRateMax, int stepsMax)
{
int n = this.data.Length;
int dim = this.data[0].Length;
int nRows = 1; // for map
int nCols = this.k; // for map
int rangeMax = nRows + nCols;
// compute the map
for (int step = 0; step "lt" stepsMax; ++step)
{
// show progress 5 + 1 times
if (step % (int)(stepsMax / 5) == 0)
{
Console.Write("map build step = " +
step.ToString().PadLeft(4));
double sum = 0.0; // sum squared dists
for (int ix = 0; ix "lt" n; ++ix)
{
int[] RC = ClosestNode(ix);
int kk = RC[1]; // RC[0] is always 0
double[] item = this.data[ix];
double[] node = this.map[0][kk];
double dist = EucDist(item, node);
sum += dist; // sum Euclidean distances
}
Console.WriteLine(" | SED = " +
sum.ToString("F4").PadLeft(9));
}
double pctLeft = 1.0 - ((step * 1.0) / stepsMax);
int currRange = (int)(pctLeft * rangeMax);
double currLrnRate = pctLeft * lrnRateMax;
// Pick random data index
int idx = this.rnd.Next(0, n);
// Get (row,col) of closest map node -- 'bmu'
int[] bmuRC = ClosestNode(idx);
// Move each map mode closer to the bmu
for (int i = 0; i "lt" nRows; ++i)
{
for (int j = 0; j "lt" nCols; ++j)
{
if (ManDist(bmuRC[0],
bmuRC[1], i, j) "lte" currRange)
{
for (int d = 0; d "lt" dim; ++d)
this.map[i][j][d] = this.map[i][j][d] +
currLrnRate * (this.data[idx][d] -
this.map[i][j][d]);
}
} // j
} // i
} // step
// map has been created
// compute mapping
for (int idx = 0; idx "lt" n; ++idx)
{
// node map coords of node closest to data(idx)
int[] rc = ClosestNode(idx);
int r = rc[0]; int c = rc[1];
this.mapping[r][c].Add(idx);
}
// results in this.mapping
return;
} // Cluster()
// ------------------------------------------------------
public int[] GetClustering()
{
// cluster ID for every data item
int n = this.data.Length;
int[] result = new int[n]; // ID for each item
for (int kk = 0; kk "lt" this.k; ++kk)
{
for (int i = 0; i "lt" this.mapping[0][kk].Count; ++i)
{
int dataIdx = this.mapping[0][kk][i];
result[dataIdx] = kk;
}
}
return result;
}
public double[][] GetBetweenNodeDists()
{
// between map node distances
double[][] result = new double[this.k][];
for (int kk = 0; kk "lt" this.k; ++kk)
result[kk] = new double[this.k];
for (int i = 0; i "lt" this.k; ++i)
{
for (int j = i; j "lt" this.k; ++j)
{
double dist =
EucDist(this.map[0][i], this.map[0][j]);
result[i][j] = dist;
result[j][i] = dist;
}
}
return result;
}
// ------------------------------------------------------
// private helper functions
// ------------------------------------------------------
private static int ManDist(int r1, int c1,
int r2, int c2)
{
// Manhattan distance between two SOM map cells
return Math.Abs(r1 - r2) + Math.Abs(c1 - c2);
}
// ------------------------------------------------------
private static double EucDist(double[] v1,
double[] v2)
{
// Euclidean distance between two data items
double sum = 0;
for (int i = 0; i "lt" v1.Length; ++i)
sum += (v1[i] - v2[i]) * (v1[i] - v2[i]);
return Math.Sqrt(sum);
}
// ------------------------------------------------------
private int[] ClosestNode(int idx)
{
// r,c coords in map of node closest to data[idx]
double smallDist = double.MaxValue;
int[] result = new int[] { 0, 0 }; // (row, col)
for (int i = 0; i "lt" this.map.Length; ++i)
{
for (int j = 0; j "lt" this.map[0].Length; ++j)
{
double dist = EucDist(this.data[idx],
this.map[i][j]);
if (dist "lt" smallDist)
{
smallDist = dist;
result[0] = i;
result[1] = j;
}
}
}
return result;
}
// ------------------------------------------------------
// misc. public utility functions for convenience
// MatLoad, VecLoad, MatShow, VecShow, ListShow
// MatStandardize
public static double[][] MatLoad(string fn,
int[] usecols, char sep, string comment)
{
// count number of non-comment lines
int nRows = 0;
string line = "";
FileStream ifs = new FileStream(fn, FileMode.Open);
StreamReader sr = new StreamReader(ifs);
while ((line = sr.ReadLine()) != null)
if (line.StartsWith(comment) == false)
++nRows;
sr.Close(); ifs.Close();
// make result matrix
int nCols = usecols.Length;
double[][] result = new double[nRows][];
for (int r = 0; r "lt" nRows; ++r)
result[r] = new double[nCols];
line = "";
string[] tokens = null;
ifs = new FileStream(fn, FileMode.Open);
sr = new StreamReader(ifs);
int i = 0;
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith(comment) == true)
continue;
tokens = line.Split(sep);
for (int j = 0; j "lt" nCols; ++j)
{
int k = usecols[j]; // into tokens
result[i][j] = double.Parse(tokens[k]);
}
++i;
}
sr.Close(); ifs.Close();
return result;
}
// ------------------------------------------------------
public static int[] VecLoad(string fn, int usecol,
string comment)
{
char dummySep = ',';
double[][] tmp = MatLoad(fn, new int[] { usecol },
dummySep, comment);
int n = tmp.Length;
int[] result = new int[n];
for (int i = 0; i "lt" n; ++i)
result[i] = (int)tmp[i][0];
return result;
}
// ------------------------------------------------------
public static void MatShow(double[][] M, int dec,
int wid, bool showIndices)
{
double small = 1.0 / Math.Pow(10, dec);
for (int i = 0; i "lt" M.Length; ++i)
{
if (showIndices == true)
{
int pad = M.Length.ToString().Length;
Console.Write("[" + i.ToString().
PadLeft(pad) + "]");
}
for (int j = 0; j "lt" M[0].Length; ++j)
{
double v = M[i][j];
if (Math.Abs(v) "lt" small) v = 0.0;
Console.Write(v.ToString("F" + dec).
PadLeft(wid));
}
Console.WriteLine("");
}
}
// ------------------------------------------------------
public static void VecShow(int[] vec, int wid)
{
int n = vec.Length;
for (int i = 0; i "lt" n; ++i)
Console.Write(vec[i].ToString().PadLeft(wid));
Console.WriteLine("");
}
// ------------------------------------------------------
public static void VecShow(double[] vec, int decimals,
int wid)
{
int n = vec.Length;
for (int i = 0; i "lt" n; ++i)
Console.Write(vec[i].ToString("F" + decimals).
PadLeft(wid));
Console.WriteLine("");
}
// ------------------------------------------------------
public static void ListShow(List"lt"int"gt" lst)
{
int n = lst.Count;
for (int i = 0; i "lt" n; ++i)
{
Console.Write(lst[i] + " ");
}
Console.WriteLine("");
}
// ------------------------------------------------------
public static double[][] MatStandard(double[][] data)
//public static double[][] MatStandard(double[][] data,
// out double[] means, out double[] stds)
{
// scikit style z-score biased normalization
int nRows = data.Length;
int nCols = data[0].Length;
// make result matrix
double[][] result = new double[nRows][];
for (int r = 0; r "lt" nRows; ++r)
result[r] = new double[nCols];
// compute means
double[] mns = new double[nCols];
for (int j = 0; j "lt" nCols; ++j)
{
double sum = 0.0;
for (int i = 0; i "lt" nRows; ++i)
sum += data[i][j];
mns[j] = sum / nRows;
} // j
// compute std devs
double[] sds = new double[nCols];
for (int j = 0; j "lt" nCols; ++j)
{
double sum = 0.0;
for (int i = 0; i "lt" nRows; ++i)
sum += (data[i][j] - mns[j]) *
(data[i][j] - mns[j]);
sds[j] = Math.Sqrt(sum / nRows); // biased
} // j
// normalize
for (int j = 0; j "lt" nCols; ++j)
{
for (int i = 0; i "lt" nRows; ++i)
result[i][j] =
(data[i][j] - mns[j]) / sds[j];
} // j
//means = mns;
//stds = sds;
return result;
}
// ------------------------------------------------------
} // class SOM
} // ns

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.