Three Ways to Convert Spectral Clustering Eigenvectors to Cluster Labels

Four common data clustering techniques are k-means, DBSCAN, Gaussian mixture model, and spectral. Each of these four techniques has many variations. Recently, I’ve been looking at spectral clustering.

The standard definitive reference for spectral clustering is “A Tutorial on Spectral Clustering” by U. von Luxburg. It’s available online.

Spectral clustering to k clusters involves the following five steps. Each step has several possible techniques; I’ve listed the most common.

1. load data nxd X matrix into memory (n items, d dim)
2. compute nxn affinity matrix A from X
   (nearest connectivity, RBF similarity, mapped distance,
    radius neighbors)
3. compute nxn Laplacian graph matrix L from A
   (un-normalized, normalized random walk, normalized symmetric)
4. compute k smallest eigenvectors E from L
   (several algorithms)
5. compute cluster labels from E
   (k-means, SVD discretize, QR, any custom clustering algorithm)

In step #5, each of the three common ways to convert smallest eigenvectors (actually, the eigenvectors that correspond to the k smallest eigenvalues) to cluster labels has pros and cons.

Using k-means is by far the simplest, works well in practice, and is the most common technique, but it must be run several times because each run can give a different clustering result.

The SVD discretize technique uses singular value decomposition, which means it is internally iterative and complex, but only needs one pass. (See code below).

The QR technique uses QR decomposition with pivoting and also SVD decomposition. It is deceptively simple looking because all the work is done by QR and SVD decomposition:

def cluster_qr(vectors):
  k = vectors.shape[1]
  _, _, piv = qr(vectors.T, pivoting=True)
  ut, _, v = svd(vectors[piv[:k], :].T)
  vectors = abs(np.dot(vectors, np.dot(ut, v.conj())))
  return vectors.argmax(axis=1)

I put together a demo using Python. I created a dummy dataset with 18 items where items [3], [4], [5] form an obvious cluster inside a ring of the other 13 data items.

My demo uses radius neighbors to compute the affinity matrix. I copied the code for the SVD discretize and QR clustering functions from the scikit library.

Based on my experiments, I conclude that in most spectral clustering scenarios, using k-means to compute cluster labels from the eigenvectors is the best approach because it’s the simplest, and simple usually trumps elegant.

One important cluster of science fiction novel themes is “space opera” — colorful, dramatic, large-scale science fiction adventure. My favorite is the Mars series by author Edgar Rice Burroughs. Another well known series is the Lensman series by author E.E. Smith. It consists of six novels written from 1948 to 1954. “Galactic Patrol” (1950) is my favorite of the six. The hero is Kimball Kinnison. He is a space policeman who has just graduated from the Space Academy and battles space pirates. The Lensman series hasn’t held up nearly as well as other early sci if series but it’s still pretty good. Here are covers from three different editions. Left: By Ric Binkley (1950). Center: By Jack Gaughan (1970). Right: By John Schoenherr (1964).

Demo code. Replace “lt”, “gt”, “lte”, “gte” with Boolean operator symbols.

# spectral_scratch.py

import numpy as np

print("\nBegin spectral clustering demo ")

np.set_printoptions(precision=4, suppress=True,
  floatmode='fixed', linewidth=400)

# 0. set / load data
X = np.array([[0.20, 0.20],
[0.20, 0.70],
[0.30, 0.90],
[0.50, 0.50],
[0.50, 0.60],
[0.60, 0.50],
[0.70, 0.90],
[0.40, 0.10],
[0.90, 0.70],
[0.80, 0.20],
[0.10, 0.50],
[0.20, 0.40],
[0.60, 0.10],
[0.70, 0.10],
[0.90, 0.40],
[0.90, 0.50],
[0.80, 0.80],
[0.50, 0.90]])

N = 18
k = 2
print("\nData: ")
print(X[0:3,:])
print(". . . ")

# 1. compute affinity matrix from X

def my_dist(x1, x2):
  dim = len(x1)
  sum = 0.0
  for i in range(dim):
    sum += (x1[i] - x2[i]) * (x1[i] - x2[i])
  result = np.sqrt(sum)
  return result

eps = 0.30
print("\nAffinity (radius = 0.3) computed from scratch: ")
A = np.zeros((N,N), dtype=np.float64)
for i in range(N):
  for j in range(N):
    d = my_dist(X[i], X[j])
    if d "lt" eps:
      A[i][j] = d
print(A[0:3,:])
print(". . . ")

# 2. compute graph Laplacian matrix from A

# degree matrix
D = np.zeros((N,N), dtype=np.float32)
for i in range(N):
  sum = 0.0
  for j in range(N):
    sum += A[i][j]
  D[i][i] = sum

L = D - A
print("\nGraph Laplacian (un-normalized):")
print(L[0:3,:])
print(". . . ")

# 3. compute k smallest eigenvectors from L

print("\nComputing k-smallest (2) eigenvectors ")
from scipy import linalg
evals, evecs = linalg.eig(L)
evals = np.real(evals)
evecs = np.real(evecs)

indices = np.argsort(evals)  # smallest to largest
extracted_evecs = evecs[0:N, indices[0:k]]  # first k columns

# 4. apply k-means to smallest eigenvectors

print("\nUsing eigenvectors to cluster ")

from sklearn.cluster import KMeans
km = KMeans(n_clusters=k, random_state=0).fit(extracted_evecs)
print("\nclustering via KMeans module: ")
print(km.labels_)

# -----------------------------------------------------------

from scipy.linalg import LinAlgError, qr, svd

def discretize(vectors, *, copy=True, max_svd_restarts=30,
  n_iter_max=20, random_state=None):
  from scipy.sparse import csc_matrix
  from scipy.linalg import LinAlgError

  vectors = np.copy(vectors)
  eps = np.finfo(float).eps
  n_samples, n_components = vectors.shape
  norm_ones = np.sqrt(n_samples)
  for i in range(vectors.shape[1]):
    vectors[:, i] = \
      (vectors[:, i] / np.linalg.norm(vectors[:, i])) * \
      norm_ones
    if vectors[0, i] != 0:
      vectors[:, i] = -1 * vectors[:, i] * \
      np.sign(vectors[0, i])
  vectors = vectors / np.sqrt((vectors ** \
  2).sum(axis=1))[:, np.newaxis]
  svd_restarts = 0
  has_converged = False
  while (svd_restarts "lt" max_svd_restarts) and \
    not has_converged:
    rotation = np.zeros((n_components, n_components))
    rotation[:, 0] = \
      vectors[random_state.randint(n_samples), :].T
    c = np.zeros(n_samples)
    for j in range(1, n_components):
      c += np.abs(np.dot(vectors, rotation[:, j - 1]))
      rotation[:, j] = vectors[c.argmin(), :].T
    last_objective_value = 0.0
    n_iter = 0
    while not has_converged:
      n_iter += 1
      t_discrete = np.dot(vectors, rotation)
      labels = t_discrete.argmax(axis=1)
      vectors_discrete = csc_matrix( \
        (np.ones(len(labels)), (np.arange(0, n_samples), \
        labels)), shape=(n_samples, n_components), # ??
      )
      t_svd = vectors_discrete.T * vectors
      try:
        U, S, Vh = np.linalg.svd(t_svd)
      except LinAlgError:
        svd_restarts += 1
        print("SVD did not converge, trying again")
        break

      ncut_value = 2.0 * (n_samples - S.sum())
      if (abs(ncut_value - last_objective_value) "lt" \
        eps) or (n_iter "gt" n_iter_max):
        has_converged = True
      else:
        # otherwise calculate rotation and continue
        last_objective_value = ncut_value
        rotation = np.dot(Vh.T, U.T)

    if not has_converged:
      raise LinAlgError("SVD did not converge")
    return labels

# -----------------------------------------------------------

rnd = np.random.RandomState(1)
lbls_discretize = discretize(extracted_evecs, 
  random_state=rnd)
print("\nclustering via SVD discretize: ")
print(lbls_discretize)

# -----------------------------------------------------------

def cluster_qr(vectors):
  k = vectors.shape[1]
  _, _, piv = qr(vectors.T, pivoting=True)
  ut, _, v = svd(vectors[piv[:k], :].T)
  vectors = abs(np.dot(vectors, np.dot(ut, v.conj())))
  return vectors.argmax(axis=1)

# -----------------------------------------------------------

lbls_qr = cluster_qr(extracted_evecs)
print("\nclustering via QR: ")
print(lbls_qr)

# -----------------------------------------------------------

print("\nEnd demo ")