I’ve been working on a set of related programming projects over the past couple of weeks. Classification, cluster analysis, and rule set extraction are closely related topics. Suppose you have a set of data points (also called vectors or tuples) of some sort. These data points could be numeric abstractions such as geometric points, like (0, 3, -1), or the data points might be rows of a SQL database like (Smith, Stan, $21.33, Developer). Now suppose you have a set of known categories, such as c0 = "likely to vote Democratic", c1 = "likely to vote Republican", and so on. Programmatic classification is the process of assigning each data point to a particular category. Programmatic clustering is similar to classification except that you don’t have known categories; instead the data points are grouped together into clusters of similar data points. Both classification and clustering can be supervised or unsupervised. With a supervised approach, a set of preliminary training data points are manually classified or clustered, and then this information is used to classify or cluster additional new data points. There is a huge body of research on classification and cluster analysis. However, the majority of this research deals with purely numerical data such as (3.0, 5.0, 2.0). There is much less research on categorical data such as (red, small, hot). The main reason for this is that most classification and clustering algorithms rely on some form of a difference function. It’s not too hard to compute a number which represents the difference between (2.0, 3.0, 4.0) and (1.0, 3.5, 2.7), but it’s a harder problem to determine the difference between (red, small, hot) and (blue, large, cold). Anyway, I’ve found what I believe to be some very cool new ways to perform classification and clustering of categorical data. The topic of rule set extraction enters the mix then: after clustering your data, how can you extract a set of if..then rules that correspond to the clustering result? Again, I’m working on some ideas that really fascinate me.
Books (By Me!)
Events (I Speak At!)
-
Recent Posts
Archives
Categories
.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference