I spend 30 minutes each morning before work looking at technical stuff. One morning I decided to dissect the scikit library SpectralCustering module. Spectral clustering is much more complicated than the two most common techniques, k-means clustering and DBSCAN clustering.
The first step in spectral clustering is to compute an affinity matrix — the similarity of each data item with every other data item. There are many ways to do this, including RBF (radial basis function) similarity, nearest neighbors connectivity, epsilon radius neighbors, heat kernel applied to distances, and so on. An affinity matrix is similar to, but technically a bit different than, a similarity matrix.
One of the things that I don’t like about spectral clustering is that there are dozens of ways to implement it, as opposed to a technique like k-means which has fewer design options.
The scikit default is RBF with gamma = 1.0. To explore, I located the scikit source code on my machine and decorated the code by adding print() statements to display the internal affinity matrix. My installation of the scikit code was located at C:\Users\(user)\Anaconda3\Lib\site-packages\sklearn\cluster in file _spectral.py. I added two print statements:
. . .
else:
params = self.kernel_params
if params is None:
params = {}
if not callable(self.affinity):
params["gamma"] = self.gamma
params["degree"] = self.degree
params["coef0"] = self.coef0
self.affinity_matrix_ = pairwise_kernels(
X, metric=self.affinity, filter_params=True, **params
)
# === added:
print("\nAffinity matrix computed inside library: ")
print(self.affinity_matrix_); input() # is an array
# ===
random_state = check_random_state(self.random_state)
. . .
To verify my understanding, I wrote a from-scratch function to compute the same affinity matrix. I got the same result as the internal affinity matrix, as expected.
I ran a few experiments with small datasets, and the RBF affinity approach did not seem to work quite as well as nearest neighbors affinity, and a custom epsilon radius neighbors. Curious. I need to investigate. Update: After more experiments, RBF affinity with a better choice of gamma works just as well as nearest neighbors affinity.

The values on the diagonal of the affinity matrix are 1.0 because each data item has maximum RBF similarity with itself. For data item [0], the closest point is item [7] which has the second largest RBF value (0.9512).
At that point, it was time to go to work, but I was satisfied I gained a new bit of knowledge.
My next steps will be to explore how the affinity matrix is converted to a Laplacian matrix (a complex problem), which is then decomposed using eigenvalue decomposition (a complex problem), and then those results are used (somehow) to cluster the source data. Update: I did this. See https://jamesmccaffreyblog.com/2023/11/27/spectral-data-clustering-from-scratch-using-csharp/.

Three old movies that have high similarity. Left: In “The Beast From 20,000 Fathoms” (1953), a prehistoric creature rampages through New York City. Center: In “Godzilla” (1954), a prehistoric creature rampages through Tokyo. In “The Giant Behemoth” (1959), a prehistoric creature rampages through London. I like all three of these movies.
Demo code:
# spectral_demo_2.py
import numpy as np
from sklearn.cluster import SpectralClustering
# SpectralClustering(n_clusters=8, *, eigen_solver=None,
# n_components=None, random_state=None, n_init=10,
# gamma=1.0, affinity='rbf', n_neighbors=10,
# eigen_tol='auto', assign_labels='kmeans', degree=3,
# coef0=1, kernel_params=None, n_jobs=None,
# verbose=False)
print("\nBegin spectral clustering demo ")
np.set_printoptions(precision=4, suppress=True,
floatmode='fixed', linewidth=100)
X = np.array([[0.20, 0.20],
[0.20, 0.70],
[0.30, 0.90],
[0.50, 0.50],
[0.50, 0.60],
[0.60, 0.50],
[0.70, 0.90],
[0.40, 0.10],
[0.90, 0.70],
[0.80, 0.20]])
print("\nData: ")
print(X)
print("\nApplying SpectralClustering() ")
clustering = SpectralClustering(n_clusters=2,
# affinity='nearest_neighbors', n_neighbors=3,
affinity='rbf', gamma=1.0,
assign_labels='kmeans', random_state=0).fit(X)
print("\nResult: ")
print(clustering.labels_)
def my_rbf(x1, x2, gamma):
# documentation: "np.exp(-gamma * d(X,X) ** 2)"
dim = len(x1)
sum = 0.0
for i in range(dim):
sum += (x1[i] - x2[i]) * (x1[i] - x2[i])
result = np.exp(-gamma * sum)
return result
print("\naffinity computed from scratch: ")
my_affinity = np.zeros((10,10), dtype=np.float64)
for i in range(10):
for j in range(10):
my_affinity[i][j] = my_rbf(X[i], X[j], 1.0)
print(my_affinity)
print("\nEnd demo ")

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2025 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2025 G2E Conference
2025 iSC West Conference
You must be logged in to post a comment.