Four common data clustering techniques are k-means, DBSCAN, Gaussian mixture model, and spectral. Each of these four techniques has many variations. Recently, I’ve been looking at spectral clustering.
The standard definitive reference for spectral clustering is “A Tutorial on Spectral Clustering” by U. von Luxburg. It’s available online.
Spectral clustering to k clusters involves the following five steps. Each step has several possible techniques; I’ve listed the most common.
1. load data nxd X matrix into memory (n items, d dim)
2. compute nxn affinity matrix A from X
(nearest connectivity, RBF similarity, mapped distance,
radius neighbors)
3. compute nxn Laplacian graph matrix L from A
(un-normalized, normalized random walk, normalized symmetric)
4. compute k smallest eigenvectors E from L
(several algorithms)
5. compute cluster labels from E
(k-means, SVD discretize, QR, any custom clustering algorithm)
In step #5, each of the three common ways to convert smallest eigenvectors (actually, the eigenvectors that correspond to the k smallest eigenvalues) to cluster labels has pros and cons.
Using k-means is by far the simplest, works well in practice, and is the most common technique, but it must be run several times because each run can give a different clustering result.
The SVD discretize technique uses singular value decomposition, which means it is internally iterative and complex, but only needs one pass. (See code below).
The QR technique uses QR decomposition with pivoting and also SVD decomposition. It is deceptively simple looking because all the work is done by QR and SVD decomposition:
def cluster_qr(vectors): k = vectors.shape[1] _, _, piv = qr(vectors.T, pivoting=True) ut, _, v = svd(vectors[piv[:k], :].T) vectors = abs(np.dot(vectors, np.dot(ut, v.conj()))) return vectors.argmax(axis=1)
I put together a demo using Python. I created a dummy dataset with 18 items where items [3], [4], [5] form an obvious cluster inside a ring of the other 13 data items.
My demo uses radius neighbors to compute the affinity matrix. I copied the code for the SVD discretize and QR clustering functions from the scikit library.
Based on my experiments, I conclude that in most spectral clustering scenarios, using k-means to compute cluster labels from the eigenvectors is the best approach because it’s the simplest, and simple usually trumps elegant.

One important cluster of science fiction novel themes is “space opera” — colorful, dramatic, large-scale science fiction adventure. My favorite is the Mars series by author Edgar Rice Burroughs. Another well known series is the Lensman series by author E.E. Smith. It consists of six novels written from 1948 to 1954. “Galactic Patrol” (1950) is my favorite of the six. The hero is Kimball Kinnison. He is a space policeman who has just graduated from the Space Academy and battles space pirates. The Lensman series hasn’t held up nearly as well as other early sci if series but it’s still pretty good. Here are covers from three different editions. Left: By Ric Binkley (1950). Center: By Jack Gaughan (1970). Right: By John Schoenherr (1964).
Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols.
# spectral_scratch.py
import numpy as np
print("\nBegin spectral clustering demo ")
np.set_printoptions(precision=4, suppress=True,
floatmode='fixed', linewidth=400)
# 0. set / load data
X = np.array([[0.20, 0.20],
[0.20, 0.70],
[0.30, 0.90],
[0.50, 0.50],
[0.50, 0.60],
[0.60, 0.50],
[0.70, 0.90],
[0.40, 0.10],
[0.90, 0.70],
[0.80, 0.20],
[0.10, 0.50],
[0.20, 0.40],
[0.60, 0.10],
[0.70, 0.10],
[0.90, 0.40],
[0.90, 0.50],
[0.80, 0.80],
[0.50, 0.90]])
N = 18
k = 2
print("\nData: ")
print(X[0:3,:])
print(". . . ")
# 1. compute affinity matrix from X
def my_dist(x1, x2):
dim = len(x1)
sum = 0.0
for i in range(dim):
sum += (x1[i] - x2[i]) * (x1[i] - x2[i])
result = np.sqrt(sum)
return result
eps = 0.30
print("\nAffinity (radius = 0.3) computed from scratch: ")
A = np.zeros((N,N), dtype=np.float64)
for i in range(N):
for j in range(N):
d = my_dist(X[i], X[j])
if d "lt" eps:
A[i][j] = d
print(A[0:3,:])
print(". . . ")
# 2. compute graph Laplacian matrix from A
# degree matrix
D = np.zeros((N,N), dtype=np.float32)
for i in range(N):
sum = 0.0
for j in range(N):
sum += A[i][j]
D[i][i] = sum
L = D - A
print("\nGraph Laplacian (un-normalized):")
print(L[0:3,:])
print(". . . ")
# 3. compute k smallest eigenvectors from L
print("\nComputing k-smallest (2) eigenvectors ")
from scipy import linalg
evals, evecs = linalg.eig(L)
evals = np.real(evals)
evecs = np.real(evecs)
indices = np.argsort(evals) # smallest to largest
extracted_evecs = evecs[0:N, indices[0:k]] # first k columns
# 4. apply k-means to smallest eigenvectors
print("\nUsing eigenvectors to cluster ")
from sklearn.cluster import KMeans
km = KMeans(n_clusters=k, random_state=0).fit(extracted_evecs)
print("\nclustering via KMeans module: ")
print(km.labels_)
# -----------------------------------------------------------
from scipy.linalg import LinAlgError, qr, svd
def discretize(vectors, *, copy=True, max_svd_restarts=30,
n_iter_max=20, random_state=None):
from scipy.sparse import csc_matrix
from scipy.linalg import LinAlgError
vectors = np.copy(vectors)
eps = np.finfo(float).eps
n_samples, n_components = vectors.shape
norm_ones = np.sqrt(n_samples)
for i in range(vectors.shape[1]):
vectors[:, i] = \
(vectors[:, i] / np.linalg.norm(vectors[:, i])) * \
norm_ones
if vectors[0, i] != 0:
vectors[:, i] = -1 * vectors[:, i] * \
np.sign(vectors[0, i])
vectors = vectors / np.sqrt((vectors ** \
2).sum(axis=1))[:, np.newaxis]
svd_restarts = 0
has_converged = False
while (svd_restarts "lt" max_svd_restarts) and \
not has_converged:
rotation = np.zeros((n_components, n_components))
rotation[:, 0] = \
vectors[random_state.randint(n_samples), :].T
c = np.zeros(n_samples)
for j in range(1, n_components):
c += np.abs(np.dot(vectors, rotation[:, j - 1]))
rotation[:, j] = vectors[c.argmin(), :].T
last_objective_value = 0.0
n_iter = 0
while not has_converged:
n_iter += 1
t_discrete = np.dot(vectors, rotation)
labels = t_discrete.argmax(axis=1)
vectors_discrete = csc_matrix( \
(np.ones(len(labels)), (np.arange(0, n_samples), \
labels)), shape=(n_samples, n_components), # ??
)
t_svd = vectors_discrete.T * vectors
try:
U, S, Vh = np.linalg.svd(t_svd)
except LinAlgError:
svd_restarts += 1
print("SVD did not converge, trying again")
break
ncut_value = 2.0 * (n_samples - S.sum())
if (abs(ncut_value - last_objective_value) "lt" \
eps) or (n_iter "gt" n_iter_max):
has_converged = True
else:
# otherwise calculate rotation and continue
last_objective_value = ncut_value
rotation = np.dot(Vh.T, U.T)
if not has_converged:
raise LinAlgError("SVD did not converge")
return labels
# -----------------------------------------------------------
rnd = np.random.RandomState(1)
lbls_discretize = discretize(extracted_evecs,
random_state=rnd)
print("\nclustering via SVD discretize: ")
print(lbls_discretize)
# -----------------------------------------------------------
def cluster_qr(vectors):
k = vectors.shape[1]
_, _, piv = qr(vectors.T, pivoting=True)
ut, _, v = svd(vectors[piv[:k], :].T)
vectors = abs(np.dot(vectors, np.dot(ut, v.conj())))
return vectors.argmax(axis=1)
# -----------------------------------------------------------
lbls_qr = cluster_qr(extracted_evecs)
print("\nclustering via QR: ")
print(lbls_qr)
# -----------------------------------------------------------
print("\nEnd demo ")


.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.