I Give a Talk About Fuzzy C-Means Data Clustering

Quite some time ago, I was working with data clustering algorithms. Data clustering is the process of grouping data so that similar items are in the same group/cluster, and also clusters are different from each other.

There are many different clustering algorithms. The two most common algorithms are called k-means clustering (and a minor variation called k-means++), and Gaussian mixture model clustering. These two algorithms only work with strictly numeric data, so you can’t have variables like sex = (male, female) or hair_color = (brown, blonde, black, red, gray).

Note that clustering non-numeric data is surprisingly tricky. I devised several clustering algorithms for non-numeric data, and wrote up technical articles:

Data Clustering Using Category Utility:
https://msdn.microsoft.com/en-us/magazine/dn198247.aspx

Data Clustering Using Entropy Minimization
https://visualstudiomagazine.com/articles/2013/02/01/data-clustering-using-entropy-minimization.aspx

Data Clustering Using Naive Bayes Inference
https://msdn.microsoft.com/en-us/magazine/jj991980.aspx

Anyway, one of the lesser-known clustering algorithms for numeric data is called fuzzy C-means clustering. The key idea of fuzzy clustering is that instead of assigning a data item to one of the k classes definitively, a data item is assigned one membership value for each possible cluster where the membership values indicate the degree to which the item belongs to each cluster.

Suppose you set k = 3, and each data item represents a person’s height and weight. Then the result of fuzzy C-means clustering might look something like:

Height  Weight   k=0   k=1   k=2
=================================
 65.0    120.0   0.82  0.08  0.10
 72.0    185.0   0.10  0.30  0.60
. . .

Here, the person whose (height, weight) is (65.0, 120.0) has mostly membership in the k=0 cluster.

Well, in the end, fuzzy C-means data clustering isn’t used very much because the additional information you get is somewhat difficult to interpret and use.



Fuzzy hats. Sometimes nice on women, sometimes not. But never, ever nice on guys.

This entry was posted in Machine Learning. Bookmark the permalink.