Bottom line: I recently implemented a function to compute the singular value decomposition (SVD) of a matrix, from scratch, using Python. That implementation used many built-in NumPy functions such as np.diag(). I refactored the implementation to remove all those NumPy calls, making a version of SVD that’s even more from-scratch. This implementation uses the Householder + QR method. It works but it’s not as stable as my older version that uses the Jacobi algorithm. Put another way, the code in this blog post is interesting but not practical.
Computing the singular value decomposition (SVD) of a matrix is one of the most difficult problems in numerical programming. Many researchers worked for many years to develop a solution. The standard version of SVD (LAPACK) is over 10,000 lines of Fortran code!
I’ve plunked away at a from-scratch implementation of SVD for many years. When I got some free time recently, I was able to dedicate about 50 hours of my time for a pretty decent (estimated accuracy to about 1.0e-10 and reasonable performance on matrices up to about 1000 rows).
The first part of my demo shows an example of SVD on a tall (more rows than columns) matrix:
Begin SVD demo TALL MATRIX A = [[ 1 2 3 4 5] [ 0 -3 5 -7 9] [ 2 0 -2 0 -2] [ 4 -1 5 6 1] [ 3 6 8 2 2] [ 5 -2 4 -4 3]] U = [[ 0.319267 0.295047 0.358160 0.465731 0.652040] [ 0.625260 -0.614481 0.189087 0.174205 -0.138090] [-0.129989 0.032917 -0.342862 -0.160369 0.572690] [ 0.284056 0.494504 -0.490845 0.494308 -0.381366] [ 0.486033 0.495867 0.263713 -0.648653 -0.103790] [ 0.416300 -0.209426 -0.638701 -0.248874 0.267559]] s = [15.967661 12.793149 6.297366 5.706879 2.478679] Vt = [[ 0.296544 0.035215 0.708796 -0.130799 0.625557] [ 0.217254 0.416871 0.261754 0.803400 -0.255056] [-0.745282 0.555722 -0.030757 0.039094 0.365040] [-0.187161 -0.609726 -0.196995 0.579569 0.467438] [ 0.523820 0.379983 -0.623971 0.003540 0.438033]] reconstructed A [[ 1.000000 2.000000 3.000000 4.000000 5.000000] [ 0.000000 -3.000000 5.000000 -7.000000 9.000000] [ 2.000000 0.000000 -2.000000 0.000000 -2.000000] [ 4.000000 -1.000000 5.000000 6.000000 1.000000] [ 3.000000 6.000000 8.000000 2.000000 2.000000] [ 5.000000 -2.000000 4.000000 -4.000000 3.000000]] ==================== np.linalg.svd results: U = [[-0.319267 0.295047 0.358160 -0.465731 0.652040] [-0.625260 -0.614481 0.189087 -0.174205 -0.138090] [ 0.129989 0.032917 -0.342862 0.160369 0.572690] [-0.284056 0.494504 -0.490845 -0.494308 -0.381366] [-0.486033 0.495867 0.263713 0.648653 -0.103790] [-0.416300 -0.209426 -0.638701 0.248874 0.267559]] s = [15.967661 12.793149 6.297366 5.706879 2.478679] Vt = [[-0.296544 -0.035215 -0.708796 0.130799 -0.625557] [ 0.217254 0.416871 0.261754 0.803400 -0.255056] [-0.745282 0.555722 -0.030757 0.039094 0.365040] [ 0.187161 0.609726 0.196995 -0.579569 -0.467438] [ 0.523820 0.379983 -0.623971 0.003540 0.438033]]
I validated the demo by using the np.linalg.svd() function to make sure I got the same results. Note that the sign of values in the columns of U and the rows of Vt are arbitrary for SVD.
For the second part of the demo, I used a wide (more columns than rows) matrix:
WIDE MATRIX A = [[ 3.000000 1.000000 1.000000 0.000000 5.000000] [-1.000000 3.000000 1.000000 -2.000000 4.000000] [ 0.000000 2.000000 2.000000 1.000000 -3.000000]] U = [[-0.727149 0.195146 -0.658158] [-0.621912 -0.593195 0.511219] [ 0.290654 -0.781049 -0.552705]] s = [7.639218 4.023860 3.232785] Vt = [[-0.204149 -0.263322 -0.100502 0.200868 -0.915716] [ 0.292911 -0.781970 -0.487131 0.100734 0.235122] [-0.768902 -0.071118 -0.387390 -0.487240 0.127506]] reconstructed A [[ 3.000000 1.000000 1.000000 -0.000000 5.000000] [-1.000000 3.000000 1.000000 -2.000000 4.000000] [ 0.000000 2.000000 2.000000 1.000000 -3.000000]] ==================== np.linalg.svd results: U = [[-0.727149 -0.195146 -0.658158] [-0.621912 0.593195 0.511219] [ 0.290654 0.781049 -0.552705]] s = [7.639218 4.023860 3.232785] Vt = [[-0.204149 -0.263322 -0.100502 0.200868 -0.915716] [-0.292911 0.781970 0.487131 -0.100734 -0.235122] [-0.768902 -0.071118 -0.387390 -0.487240 0.127506]] End demo
The refactoring took me a long time. For example, the original call to the statement:
R[k:, k:] -= 2 * np.outer(v, v @ R[k:, k:])
uses complex slicing, matrix multiplication, the np.outer() function, and scalar multiplication. It was replaced by:
tmp1 = my_row_col_to_end(R, k) # R[k:, k:] tmp2 = my_vec_mat_product(v, tmp1) # v @ R[k:, k:] tmp3 = my_outer(v, tmp2) # np.outer(v, v @ R[k:, k:]) B = my_scalar_mult(2.0, tmp3) my_subtract_corner(R, k, B) # modify R in-plce
Five statements that call five from-scratch functions.
I enjoyed the effort and learned a lot about SVD decomposition.

While I was refactoring my SVD code, I was faced with many situations where I had to decide to truncate code (to reduce number of lines) or not truncate (better clarity). Human limb truncation is usually not optional.
When I was growing up, my father was a doctor and my mother was a nurse. So my brother and sister and I grew up in a kind of medical environment. We learned early on that physical anomalies were nothing to be afraid of. Here are two amusing examples. Left: Clever math. Right: Clever tattoo.
Demo code. Caution: Almost certainly has a few minor bugs for edge cases. Replace “lt” (less than), “gt”, “lte”, “gte” with Boolean operator symbols (my blog editor chokes on symbols).
# svd_from_scratchier.py
import numpy as np
import math
# -----------------------------------------------------------
# 31 helper functions to replace NumPy calls
# -----------------------------------------------------------
def my_norm(v):
sum = 0.0
for i in range(len(v)):
sum += v[i] * v[i]
return math.sqrt(sum)
def my_norm_matrix(A):
m,n = A.shape
sum = 0.0
for i in range(m):
for j in range(n):
sum += A[i][j] * A[i][j]
return math.sqrt(sum)
def my_diag(A):
m,n = A.shape
result = np.zeros(m)
for i in range(m):
result[i] = A[i][i]
return result
def my_create_diag(v):
# matrix from diag vector
n = len(v)
result = np.zeros((n,n))
for i in range(n):
result[i][i] = v[i]
return result
def my_eye(n):
result = np.zeros((n,n))
for i in range(n):
result[i][i] = 1.0
return result
def my_copy_matrix(A):
m,n = A.shape
result = np.zeros((m,n))
for i in range(m):
for j in range(n):
result[i][j] = A[i][j]
return result
def my_copy_vector(v):
n = len(v)
result = np.zeros(n)
for i in range(n):
result[i] = v[i]
return result
def my_col_to_end(A, k):
# last part of col k, from k,k to end
m,n = A.shape
result = np.zeros(m-k)
for i in range(0, m-k):
result[i] = A[i+k][k]
return result
def my_row_col_to_end(A, k):
# row k to end, col k to end
m,n = A.shape
result = np.zeros((m-k,n-k))
for i in range(0,m-k):
for j in range(0,n-k):
result[i][j] = A[i+k][j+k]
return result
def my_vec_mat_product(v, A):
m,n = A.shape
result = np.zeros(n)
for j in range(n):
for k in range(m):
result[j] += v[k] * A[k][j]
return result
def my_mat_vec_product(A, v):
m,n = A.shape
result = np.zeros(m)
for i in range(m):
for k in range(n):
result[i] +=A[i][k] * v[k]
return result
def my_outer(v1, v2):
result = np.zeros((v1.size, v2.size))
for i in range(v1.size):
for j in range(v2.size):
result[i][j] = v1[i] * v2[j]
return result
def my_scalar_mult(x, A):
# x times each element A
m,n = A.shape
result = np.zeros((m,n))
for i in range(m):
for j in range(n):
result[i][j] = x * A[i][j]
return result
def my_subtract_corner(A, k, C):
# subtract elements in C from lower right A
# starting at [k][k]
m,n = A.shape
for i in range(0, m-k):
for j in range(0, n-k):
A[i+k][j+k] -= C[i][j]
return # in-place
def my_extract_all_rows_col_k_to_end(A, k):
m,n = A.shape
result = np.zeros((m,n-k))
for i in range(m):
for j in range(n-k):
result[i][j] = A[i][j+k]
return result
def my_subtract_all_rows_col_k_to_end(A, k, C):
# subtract elements in C from right of A
m,n = A.shape
for i in range(0, m):
for j in range(0, n-k):
A[i][j+k] -= C[i][j]
def my_extract_first_k_cols(A, k):
m,n = A.shape
result = np.zeros((m,k))
for i in range(m):
for j in range(k):
result[i][j] = A[i][j]
return result
def my_extract_first_k_rows(A, k):
m,n = A.shape
result = np.zeros((k,n))
for i in range(k):
for j in range(n):
result[i][j] = A[i][j]
return result
def my_mat_subtract(A, B):
m,n = A.shape
result = np.zeros((m,n))
for i in range(m):
for j in range(n):
result[i][j] = A[i][j] - B[i][j]
return result
def my_mat_prod(A, B):
a_rows = len(A)
a_cols = len(A[0])
b_rows = len(B)
b_cols = len(B[0])
result = np.zeros((a_rows, b_cols))
for i in range(a_rows):
for j in range(b_cols):
for k in range(a_cols):
result[i][j] += A[i][k] * B[k][j]
return result
def my_mat_transpose(A):
m,n = A.shape
result = np.zeros((n,m))
for i in range(m):
for j in range(n):
result[j][i] = A[i][j]
return result
def my_argsort(v):
n = len(v)
augmented = []
for i in range(n):
tmp = [v[i],i]
augmented.append(tmp)
augmented.sort(key=lambda x: x[0])
result = np.zeros(n, dtype=np.int64)
for i in range(n):
result[i] = augmented[i][1]
return result
def my_reversed(v):
n = len(v)
result = np.zeros(n, dtype=np.int64)
for i in range(n):
result[i] = v[n-1-i]
return result
def my_sort_using_idxs(v, idxs):
n = len(v)
result = np.zeros(n)
for i in range(n):
result[i] = v[idxs[i]]
return result
def my_sort_cols_using_idxs(A, idxs):
m,n = A.shape
result = np.zeros((m,n))
for i in range(m):
for j in range(n):
src_col = idxs[j]
result[i][j] = A[i][src_col]
return result
def my_clip_to_nonnegative(v):
n = len(v)
result = np.zeros(n)
for i in range(n):
if v[i] "lt" 0.0:
result[i] = 0.0
else:
result[i] = v[i]
return result
def my_sqrt(v):
n = len(v)
result = np.zeros(n)
for i in range(n):
result[i] = math.sqrt(v[i])
return result
def my_indices_where_nonzero(v, tol):
# 1 == not zero
n = len(v)
result = np.zeros(n, dtype=np.int64)
for i in range(n):
if v[i] "gt" tol:
result[i] = 1 # non-zero
return result
def my_remove_zeros(v, indices):
# indices: 1 is non-zero, 0 is a near-zero
n = len(v)
lst = []
for i in range(n):
if indices[i] == 1:
lst.append(v[i])
result = np.array(lst)
return result
def my_remove_cols(A, indices):
m,n = A.shape
ct_nonzeros = 0
for i in range(len(indices)):
if indices[i] == 1:
ct_nonzeros += 1
result = np.zeros((m, ct_nonzeros))
k = 0 # destination col
for j in range(n):
if indices[j] == 0: continue # next src col
for i in range(m):
result[i][k] = A[i][j]
k += 1
return result
def my_mat_div_vector(A, v):
# A / v
m,n = A.shape
if n != len(v): print("FATAL in my_mat_div_vector")
result = np.zeros((m,n))
for i in range(m):
for j in range(n):
result[i][j] = A[i][j] / v[j]
return result
# -----------------------------------------------------------
# helpers for my_svd(): my_qr_decomp(), my_eigen()
# -----------------------------------------------------------
def my_qr_decomp(A):
# reduced QR decomposition using Householder reflections
# A: (m x n), m "gte" n
# returns Q: (m x n), R: (n x n)
# used by both my_eigen() and my_svd()
# A = A.astype(float).copy() # assume A is float
m, n = A.shape
# Q = np.eye(m)
Q = my_eye(m)
# R = A.copy()
R = my_copy_matrix(A)
for k in range(n):
# x = R[k:, k]
x = my_col_to_end(R, k)
# normx = np.linalg.norm(x)
normx = my_norm(x)
if normx == 0:
continue
# v = x.copy()
v = my_copy_vector(x)
sign = 1.0 if x[0] "gte" 0 else -1.0
v[0] += sign * normx
# v /= np.linalg.norm(v)
v /= my_norm(v)
# Apply reflector to R
# R[k:, k:] -= 2 * np.outer(v, v @ R[k:, k:])
tmp1 = my_row_col_to_end(R, k) # R[k:, k:]
tmp2 = my_vec_mat_product(v, tmp1)
tmp3 = my_outer(v, tmp2)
B = my_scalar_mult(2.0, tmp3)
my_subtract_corner(R, k, B)
# Accumulate Q
# Q[:, k:] -= 2 * np.outer(Q[:, k:] @ v, v)
tmp4 = my_extract_all_rows_col_k_to_end(Q, k)
tmp5 = my_mat_vec_product(tmp4, v)
tmp6 = my_outer(tmp5, v)
C = my_scalar_mult(2.0, tmp6)
my_subtract_all_rows_col_k_to_end(Q, k, C)
# return Q[:, :n], R[:n, :]
Q_part = my_extract_first_k_cols(Q, n)
R_part = my_extract_first_k_rows(R, n)
return Q_part, R_part
# -----------------------------------------------------------
def my_eigen(A, tol=1.0e-12, max_iter=2000):
# eigen-decomp of symmetric matrix using QR iteration
# returns eigenvalues and eigenvectors.
# A = A.astype(float).copy() # assume float
AA = my_copy_matrix(A)
n = AA.shape[0]
# V = np.eye(n)
V = my_eye(n) # eigenvectors (columns)
for _ in range(max_iter):
# QR decomposition (using our own QR)
Q, R = my_qr_decomp(AA)
# A_new = R @ Q
A_new = my_mat_prod(R, Q)
# V = V @ Q
V = my_mat_prod(V, Q)
# Check convergence (off-diagonal norm)
# off = A_new - np.diag(np.diag(A_new))
off = \
my_mat_subtract(A_new, my_create_diag(my_diag(A_new)))
# if np.linalg.norm(off) "lt" tol:
if my_norm_matrix(off) "lt" tol:
AA = A_new
break
AA = A_new
# eigvals = np.diag(A)
eigvals = my_diag(AA)
return eigvals, V
# -----------------------------------------------------------
def my_svd(A, tol=1.0e-12):
# SVD using only NumPy operations:
# A = np.asarray(A, float) # assume A is float
m, n = A.shape
if m "gte" n:
Q, R = my_qr_decomp(A)
# symmetric eigenproblem
# RtR = R.T @ R
RtR = my_mat_prod(my_mat_transpose(R), R)
eigvals, V = my_eigen(RtR, tol=tol)
# sort large to small
# idx = np.argsort(eigvals)[::-1]
sort_idxs = my_reversed(my_argsort(eigvals))
# eigvals = eigvals[idx]
eigvals = my_sort_using_idxs(eigvals, sort_idxs)
# V = V[:, idx]
V = my_sort_cols_using_idxs(V, sort_idxs)
# S = np.sqrt(np.clip(eigvals, 0, None))
s = my_sqrt(my_clip_to_nonnegative(eigvals))
# nonzero = S "gt" tol
# U = Q @ (R @ V[:, nonzero] / S[nonzero])
idxs = my_indices_where_nonzero(s, tol)
s_nonzero = my_remove_zeros(s, idxs)
V_nonzero = my_remove_cols(V, idxs)
# U = Q @ ((R @ V_nonzero) / S_nonzero)
# tmp1 = R @ V_nonzero
tmp1 = my_mat_prod(R, V_nonzero)
tmp2 = my_mat_div_vector(tmp1, s_nonzero)
# U = Q @ tmp2
U = my_mat_prod(Q, tmp2)
# U = Q @ (R @ (V_nonzero / S_nonzero))
# return U, S[nonzero], V[:, nonzero].T
return U, s_nonzero, my_mat_transpose(V_nonzero)
else:
# Wide matrix: transpose trick
# U, S, Vt = svd_qr_eig_numpy_only(A.T, tol)
# return Vt.T, S, U.T
U, s, Vt = my_svd(my_mat_transpose(A), tol)
return my_mat_transpose(Vt), s, my_mat_transpose(U)
# ===========================================================
print("\nBegin SVD demo ")
print("\nTALL MATRIX ")
np.set_printoptions(precision=6,
suppress=True, floatmode='fixed')
A = np.array([[3, 1, 1],
[-1, 3, 1],
[0, 2, 2],
[4, 4, -2]])
A = np.array([[1, 2, 3, 4, 5],
[0,-3, 5, -7, 9],
[2, 0, -2, 0, -2],
[4, -1, 5, 6, 1],
[3, 6, 8, 2, 2],
[5, -2, 4, -4, 3]])
print("\nA = ");print(A)
U, s, Vt = my_svd(A)
print("\nU = "); print(U)
print("\ns = "); print(s)
print("\nVt = "); print(Vt)
A_reconstructed = my_mat_prod(U, \
my_mat_prod(my_create_diag(s), Vt))
print("\nreconstructed A"); print(A_reconstructed)
print("\n====================")
print("\nnp.linalg.svd results: ")
U, s, Vt = np.linalg.svd(A, full_matrices=False)
print("\nU = "); print(U)
print("\ns = "); print(s)
print("\nVt = "); print(Vt)
# ===========================================================
print("\nWIDE MATRIX ")
A = np.array([[3, 1, 1, 0, 5],
[-1, 3, 1, -2, 4],
[0, 2, 2, 1, -3]], dtype=np.float64)
print("\nA = ");print(A)
U, s, Vt = my_svd(A)
print("\nU = "); print(U)
print("\ns = "); print(s)
print("\nVt = "); print(Vt)
A_reconstructed = my_mat_prod(U, \
my_mat_prod(my_create_diag(s), Vt))
print("\nreconstructed A"); print(A_reconstructed)
print("\n====================")
print("\nnp.linalg.svd results: ")
U, s, Vt = np.linalg.svd(A, full_matrices=False)
print("\nU = "); print(U)
print("\ns = "); print(s)
print("\nVt = "); print(Vt)
print("\nEnd demo ")





























.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2025 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2025 G2E Conference
2025 iSC West Conference
You must be logged in to post a comment.