Principal component analysis (PCA) is a classical statistics technique for dimensionality reduction. For example, the Penguin Dataset has 333 valid (no missing values) items. Each item has four numeric predictor variables (bill length, bill width, flipper length, body weight) and a species to predict (0 = Adelie, 1 = Chinstrap, 2 = Gentoo). In some scenarios, instead of dealing with all four predictors, you might want to reduce the data to just two predictors. Of course, this is just an example. A more realistic example would be a very large dataset with many predictor variables.
The classical technique for PCA looks like:
1. load data 2. normalize data 3. compute covariance matrix from normalized data 4. compute eigenvalues and eigenvectors directly from covariance matrix 5. sort eigenvalues from large to small 6. determine how many eigenvalues to use 7. extract associated eigenvectors 8. use eigenvectors to reduce data
In the classical technique, in step #4, eigenvalues and eigenvectors are computed directly from the covariance matrix, typically by using the QR decomposition algorithm, or by using the Jacobi algorithm.
I noticed that the scikit library PCA module documentation states that singular value decomposition (SVD) is used instead of computing eigenvalues and eigenvectors directly.
If you have a matrix A and apply SVD, you get a matrix U, a vector s of “singular values”, and a matrix Vh so that A = U * S * Vh where S is a matrix with the s vector values on the diagonal. To use SVD for PCA, you apply SVD directly to the normalized source data. The eigenvectors are stored in U. To compute the eigenvectors from the singular values, the computation is eigval(i) = s(i)^2 / (n-1) where n is the number of rows in the source data.
Put another way, step #4 in PCA becomes:
4. compute eigenvalues and eigenvectors indirectly from SVD applied to normalized data (using s and U results)
I had examples of classical PCA from scratch using C# (with the QR algorithm, and with the Jacobi algorithm, for computing eigenvalues and eigenvectors) but I decided to put together an example of PCA from scratch using C# with the SVD approach. I wanted to verify my knowledge.
The classical code for step #4 look like:
double[] eigenVals; double[][] eigenVecs; Eigen(covarMat, out eigenVals, out eigenVecs); // QR or Jacobi
The SVD version for step #4 looks like:
double[][] U; double[][] Vh; double[] s; int n = stdX.Length; int dim = 4; SVD_Jacobi(MatTranspose(stdX), out U, out Vh, out s); // compute eigenvalues from singular values double[] eigenVals = new double[dim]; for (int i = 0; i "lt"; dim; ++i) eigenVals[i] = (s[i] * s[i]) / (n - 1); // eigenvectors are in U double[][] eigenVecs = MatExtract(U, dim, dim);
I ran my from-scratch C# PCA using SVD version and a scikit PCA demo on a 9-item subset of the Penguin Dataset. Both programs gave identical results (except for the signs of eigenvectors, which is OK). The source data is:
# penguin_9.txt # species: 0 = Adelie, 1 = Chinstrap, 2 = Gentoo # bill length (mm), bill depth (mm), # flipper length (mm), body mass (g) # 0, 39.1, 18.7, 181.0, 3750.0 0, 39.5, 17.4, 186.0, 3800.0 0, 40.3, 18.0, 195.0, 3250.0 1, 46.5, 17.9, 192.0, 3500.0 1, 50.0, 19.5, 196.0, 3900.0 1, 51.3, 19.2, 193.0, 3650.0 2, 46.1, 13.2, 211.0, 4500.0 2, 50.0, 16.3, 230.0, 5700.0 2, 48.7, 14.1, 210.0, 4450.0
The SVD version of PCA is often preferred to the classical direct eigenvalues and eigenvectors version. A few Internet resources say something along the lines of, “SVD is more efficient than directly computing eigenvectors” or “Computing SVD singular values is more stable than computing eigenvalues directly”, but I haven’t seen any hard research evidence that demonstrates this. (There probably is such evidence but I just haven’t seen it).
Good fun.

Three images from an Internet search for “minimalist portrait”, ordered by increasing dimensionality reduction. The middle portrait is by famous artist Ichiro Tsuruta. The other two are anonymous.
Demo code for the PCA using SVD. Replace “lt”, “gt”, “lte”, “gte”, “and” with Boolean operator symbols.
using System;
using System.IO;
namespace PrincipalComponentSVD
{
internal class PrincipalComponentProgram
{
static void Main(string[] args)
{
Console.WriteLine("\nBegin PCA from scratch C#" +
" using SVD technique ");
// 0. load data
Console.WriteLine("\nLoading Penguin-9 data ");
string dataFile =
"..\\..\\..\\Data\\penguin_9.txt";
double[][] X = MatLoad(dataFile,
new int[] { 1, 2, 3, 4 }, ',', "#");
Console.WriteLine("Done ");
Console.WriteLine("\nSource data: ");
MatShow(X, 1, 9, true);
// 1. standardize data
Console.WriteLine("\nStandardizing (z-score," +
" biased) ");
double[] means;
double[] stds;
double[][] stdX = MatStandardize(X,
out means, out stds);
Console.WriteLine("\nStandardized data items ");
MatShow(stdX, 4, 9, nRows: 3);
Console.WriteLine("\nComputing " +
" eigenvalues and eigenvectors using SVD: ");
// 2. eigenvalues eigenvectors from standardized data
// SVD approach
// eigenvectors are in U, singular vals are in s
double[][] U;
double[][] Vh;
double[] s;
int n = stdX.Length;
int dim = stdX[0].Length;
SVD_Jacobi(MatTranspose(stdX), out U, out Vh, out s);
// compute eigenvalues from singular values
double[] eigenVals = new double[dim];
for (int i = 0; i "lt" dim; ++i)
eigenVals[i] = (s[i] * s[i]) / (n - 1);
// eigenvectors are in U
double[][] eigenVecs = MatExtract(U, dim, dim);
// double[][] eigenVecs = U; // works too
// double[][] eigenVecs = MatCopy(U); // works too
Console.WriteLine("\nEigenvectors (not sorted, " +
"as cols): ");
MatShow(eigenVecs, 4, 9, false);
Console.WriteLine("\nEigenvalues, not sorted: ");
VecShow(eigenVals, 4, 9);
// 3. sort eigenvals from large to smallest
int[] idxs = ArgSort(eigenVals); // to sort evecs
Array.Reverse(idxs);
Array.Sort(eigenVals);
Array.Reverse(eigenVals);
Console.WriteLine("\nEigenvalues (sorted): ");
VecShow(eigenVals, 4, 9);
// 4. sort eigenvectors
eigenVecs = MatExtractCols(eigenVecs, idxs); // sort
eigenVecs = MatTranspose(eigenVecs); // scikit format
Console.WriteLine("\nEigenvectors (sorted, rows): ");
MatShow(eigenVecs, 4, 9, false);
// 5. variance explained to guide number components
Console.WriteLine("\nComputing variance" +
" explained: ");
double[] varExplained = VarExplained(eigenVals);
VecShow(varExplained, 4, 9);
// 6. transform all data
Console.WriteLine("\nComputing transformed" +
" data (all components): ");
double[][] transformed =
MatProduct(stdX, MatTranspose(eigenVecs)); // all
MatShow(transformed, 4, 9, true);
// 7. reconstruct data just for fun / to verify
Console.WriteLine("\nReconstructing " +
" original data: ");
double[][] reconstructed =
MatProduct(transformed, eigenVecs);
for (int i = 0; i "lt" reconstructed.Length; ++i)
for (int j = 0; j "lt" reconstructed[0].Length; ++j)
reconstructed[i][j] =
(reconstructed[i][j] * stds[j]) + means[j];
MatShow(reconstructed, 1, 9, 3);
Console.WriteLine("\nEnd demo ");
Console.ReadLine();
}// Main
// ======================================================
static void SVD_Jacobi(double[][] mat, out double[][] U,
out double[][] Vh, out double[] s)
{
// works for m != n
double EPSILON = 1.0e-15;
double[][] A = MatCopy(mat); // working U
int m = A.Length;
int n = A[0].Length; // check m == n
double[][] Q = MatIdentity(n); // working V
double[] t = new double[n]; // working s
int ct = 1; // rotation counter
int pass = 0;
double tol = 10 * n * EPSILON; // heuristic
int passMax = 5 * n;
if (passMax "lt" 15) passMax = 15; // heuristic
// save the column error estimates
for (int j = 0; j "lt" n; ++j)
{
double[] cj = MatGetColumn(A, j);
double sj = VecNorm(cj);
t[j] = EPSILON * sj;
}
while (ct "gt" 0 "and" pass "lte" passMax)
{
ct = n * (n - 1) / 2; // rotation counter
for (int j = 0; j "lt" n - 1; ++j)
{
for (int k = j + 1; k "lt" n; ++k)
{
double sin; double cos;
double[] cj = MatGetColumn(A, j);
double[] ck = MatGetColumn(A, k);
double p = 2.0 * VecDot(cj, ck);
double a = VecNorm(cj);
double b = VecNorm(ck);
double q = a * a - b * b;
double v = Hypot(p, q);
double errorA = t[j];
double errorB = t[k];
bool sorted = false;
if (a "gte" b) sorted = true;
bool orthog = false;
if (Math.Abs(p) "lte" tol * (a * b)) orthog = true;
bool badA = false;
if (a "lt" errorA) badA = true;
bool badB = false;
if (b "lt" errorB) badB = true;
if (sorted == true "and" (orthog == true ||
badA == true || badB == true))
{
--ct;
continue;
}
// compute rotation angles
if (v == 0 || sorted == false)
{
cos = 0.0;
sin = 1.0;
}
else
{
cos = Math.Sqrt((v + q) / (2.0 * v));
sin = p / (2.0 * v * cos);
}
// apply rotation to A (U)
for (int i = 0; i "lt" m; ++i)
{
double Aik = A[i][k];
double Aij = A[i][j];
A[i][j] = Aij * cos + Aik * sin;
A[i][k] = -Aij * sin + Aik * cos;
}
// update singular values
t[j] = Math.Abs(cos) * errorA +
Math.Abs(sin) * errorB;
t[k] = Math.Abs(sin) * errorA +
Math.Abs(cos) * errorB;
// apply rotation to Q (V)
for (int i = 0; i "lt" n; ++i)
{
double Qij = Q[i][j];
double Qik = Q[i][k];
Q[i][j] = Qij * cos + Qik * sin;
Q[i][k] = -Qij * sin + Qik * cos;
} // i
} // k
} // j
++pass;
} // while
// compute singular values
double prevNorm = -1.0;
for (int j = 0; j "lt" n; ++j)
{
double[] column = MatGetColumn(A, j);
double norm = VecNorm(column);
// check if singular value is zero
if (norm == 0.0 || prevNorm == 0.0
|| (j "gt" 0 "and" norm "lte" tol * prevNorm))
{
t[j] = 0.0;
for (int i = 0; i "lt" m; ++i)
A[i][j] = 0.0;
prevNorm = 0.0;
}
else
{
t[j] = norm;
for (int i = 0; i "lt" m; ++i)
A[i][j] = A[i][j] * 1.0 / norm;
prevNorm = norm;
}
}
if (ct "gt" 0)
{
Console.WriteLine("Jacobi iterations did not" +
" converge");
}
U = A;
Vh = MatTranspose(Q);
s = t;
// to sync with default np.linalg.svd() shapes:
// if m "lt" n, extract 1st m columns of U
// extract 1st m values of s
// extract 1st m rows of Vh
// not applicable for matrix inverse.
// if (m "lt" n)
// {
// U = MatExtractFirstColumns(U, m);
// s = VecExtractFirst(s, m);
// Vh = MatExtractFirstRows(Vh, m);
// }
} // MatDecomposeSVD
// ======================================================
// --- helper functions ---------------------------------
//
// MatMake, MatCopy, MatIdentity, MatLoad,
// MatExtractFirstColumns, MatExtractFirstRows,
// MatStandardize, MatGetColumn, MatTranspose,
// MatProduct, MatNormColumns, MatExtractCols, VecNorm,
// VecDot, VecExtractFirst, Hypot, ArgSort,
// MatShow, VecShow
//
// ------------------------------------------------------
static double[][] MatMake(int r, int c)
{
double[][] result = new double[r][];
for (int i = 0; i "lt" r; ++i)
result[i] = new double[c];
return result;
}
static double[][] MatCopy(double[][] m)
{
int r = m.Length; int c = m[0].Length;
double[][] result = MatMake(r, c);
for (int i = 0; i "lt" r; ++i)
for (int j = 0; j "lt" c; ++j)
result[i][j] = m[i][j];
return result;
}
static double[][] MatIdentity(int n)
{
double[][] result = MatMake(n, n);
for (int i = 0; i "lt" n; ++i)
result[i][i] = 1.0;
return result;
}
static double[][] MatLoad(string fn, int[] usecols,
char sep, string comment)
{
// count number of non-comment lines, self-contained
//int nRows = NumNonCommentLines(fn, comment);
int nRows = 0;
string line = "";
FileStream ifs = new FileStream(fn, FileMode.Open);
StreamReader sr = new StreamReader(ifs);
while ((line = sr.ReadLine()) != null)
if (line.StartsWith(comment) == false)
++nRows;
sr.Close(); ifs.Close();
int nCols = usecols.Length;
// double[][] result = MatCreate(nRows, nCols);
double[][] result = new double[nRows][];
for (int r = 0; r "lt" nRows; ++r)
result[r] = new double[nCols];
line = "";
string[] tokens = null;
ifs = new FileStream(fn, FileMode.Open);
sr = new StreamReader(ifs);
int i = 0;
while ((line = sr.ReadLine()) != null)
{
if (line.StartsWith(comment) == true)
continue;
tokens = line.Split(sep);
for (int j = 0; j "lt" nCols; ++j)
{
int k = usecols[j]; // into tokens
result[i][j] = double.Parse(tokens[k]);
}
++i;
}
sr.Close(); ifs.Close();
return result;
}
static double[][] MatExtract(double[][] src,
int r, int c)
{
// extract upper left corner
double[][] result = MatMake(r, c);
for (int i = 0; i "lt" r; ++i)
for (int j = 0; j "lt" c; ++j)
result[i][j] = src[i][j];
return result;
}
static double[][]
MatExtractFirstColumns(double[][] src, int n)
{
int nRows = src.Length;
// int nCols = src[0].Length;
double[][] result = MatMake(nRows, n);
for (int i = 0; i "lt" nRows; ++i)
for (int j = 0; j "lt" n; ++j)
result[i][j] = src[i][j];
return result;
}
static double[][]
MatExtractFirstRows(double[][] src, int n)
{
// int nRows = src.Length;
int nCols = src[0].Length;
double[][] result = MatMake(n, nCols);
for (int i = 0; i "lt" n; ++i)
for (int j = 0; j "lt" nCols; ++j)
result[i][j] = src[i][j];
return result;
}
static double[][] MatStandardize(double[][] data,
out double[] means, out double[] stds)
{
// scikit style z-score biased normalization
int rows = data.Length;
int cols = data[0].Length;
double[][] result = MatMake(rows, cols);
// compute means
double[] mns = new double[cols];
for (int j = 0; j "lt" cols; ++j)
{
double sum = 0.0;
for (int i = 0; i "lt" rows; ++i)
sum += data[i][j];
mns[j] = sum / rows;
} // j
// compute std devs
double[] sds = new double[cols];
for (int j = 0; j "lt" cols; ++j)
{
double sum = 0.0;
for (int i = 0; i "lt" rows; ++i)
sum += (data[i][j] - mns[j]) *
(data[i][j] - mns[j]);
sds[j] = Math.Sqrt(sum / rows); // biased
} // j
// normalize
for (int j = 0; j "lt" cols; ++j)
{
for (int i = 0; i "lt" rows; ++i)
result[i][j] =
(data[i][j] - mns[j]) / sds[j];
} // j
means = mns;
stds = sds;
return result;
}
static double[] MatGetColumn(double[][] m, int j)
{
int rows = m.Length;
double[] result = new double[rows];
for (int i = 0; i "lt" rows; ++i)
result[i] = m[i][j];
return result;
}
static double[][] MatTranspose(double[][] m)
{
int r = m.Length;
int c = m[0].Length;
double[][] result = MatMake(c, r);
for (int i = 0; i "lt" r; ++i)
for (int j = 0; j "lt" c; ++j)
result[j][i] = m[i][j];
return result;
}
static double[][] MatProduct(double[][] matA,
double[][] matB)
{
int aRows = matA.Length;
int aCols = matA[0].Length;
int bRows = matB.Length;
int bCols = matB[0].Length;
if (aCols != bRows)
throw new Exception("Non-conformable matrices");
double[][] result = MatMake(aRows, bCols);
for (int i = 0; i "lt" aRows; ++i) // each row of A
for (int j = 0; j "lt" bCols; ++j) // each col of B
for (int k = 0; k "lt" aCols; ++k)
result[i][j] += matA[i][k] * matB[k][j];
return result;
}
static double[][] MatExtractCols(double[][] mat,
int[] cols)
{
int srcRows = mat.Length;
int srcCols = mat[0].Length;
int tgtCols = cols.Length;
double[][] result = MatMake(srcRows, tgtCols);
for (int i = 0; i "lt" srcRows; ++i)
{
for (int j = 0; j "lt" tgtCols; ++j)
{
int c = cols[j];
result[i][j] = mat[i][c];
}
}
return result;
}
static double VecNorm(double[] vec)
{
double sum = 0.0;
int n = vec.Length;
for (int i = 0; i "lt" n; ++i)
sum += vec[i] * vec[i];
return Math.Sqrt(sum);
}
static double VecDot(double[] v1, double[] v2)
{
int n = v1.Length;
double sum = 0.0;
for (int i = 0; i "lt" n; ++i)
sum += v1[i] * v2[i];
return sum;
}
static double[] VecExtractFirst(double[] vec, int n)
{
double[] result = new double[n];
for (int i = 0; i "lt" n; ++i)
result[i] = vec[i];
return result;
}
static double Hypot(double x, double y)
{
// fancy sqrt(x^2 + y^2)
double xabs = Math.Abs(x);
double yabs = Math.Abs(y);
double min, max;
if (xabs "lt" yabs)
{
min = xabs; max = yabs;
}
else
{
min = yabs; max = xabs;
}
if (min == 0)
return max;
double u = min / max;
return max * Math.Sqrt(1 + u * u);
}
static int[] ArgSort(double[] vec)
{
int n = vec.Length;
int[] idxs = new int[n];
for (int i = 0; i "lt" n; ++i)
idxs[i] = i;
Array.Sort(vec, idxs); // sort idxs on vec vals
return idxs;
}
// ------------------------------------------------------
static void MatShow(double[][] M, int dec,
int wid, bool showIndices)
{
double small = 1.0 / Math.Pow(10, dec);
for (int i = 0; i "lt" M.Length; ++i)
{
if (showIndices == true)
{
int pad = M.Length.ToString().Length;
Console.Write("[" + i.ToString().
PadLeft(pad) + "]");
}
for (int j = 0; j "lt" M[0].Length; ++j)
{
double v = M[i][j];
if (Math.Abs(v) "lt" small) v = 0.0;
Console.Write(v.ToString("F" + dec).
PadLeft(wid));
}
Console.WriteLine("");
}
}
// ------------------------------------------------------
static void MatShow(double[][] M, int dec,
int wid, int nRows)
{
double small = 1.0 / Math.Pow(10, dec);
for (int i = 0; i "lt" nRows; ++i)
{
for (int j = 0; j "lt" M[0].Length; ++j)
{
double v = M[i][j];
if (Math.Abs(v) "lt" small) v = 0.0;
Console.Write(v.ToString("F" + dec).
PadLeft(wid));
}
Console.WriteLine("");
}
if (nRows "lt" M.Length)
Console.WriteLine(". . . ");
}
// ------------------------------------------------------
static void VecShow(double[] vec,
int dec, int wid)
{
for (int i = 0; i "lt" vec.Length; ++i)
{
double x = vec[i];
if (Math.Abs(x) "lt" 1.0e-8) x = 0.0;
Console.Write(x.ToString("F" + dec).
PadLeft(wid));
}
Console.WriteLine("");
}
static void VecShow(int[] vec, int wid)
{
for (int i = 0; i "lt" vec.Length; ++i)
{
int x = vec[i];
Console.Write(x.ToString().
PadLeft(wid));
}
Console.WriteLine("");
}
// ------------------------------------------------------
static double[] VarExplained(double[] eigenVals)
{
// assumes eigenVals are sorted large to small
int n = eigenVals.Length;
double[] result = new double[n];
double sum = 0.0;
for (int i = 0; i "lt" n; ++i)
sum += eigenVals[i];
for (int i = 0; i "lt" n; ++i)
{
double pctExplained = eigenVals[i] / sum;
result[i] = pctExplained;
}
return result;
}
} // Program
} // ns
Here’s the scikit version:
# iris_pca.py
# demo of PCA on the Iris subset
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# def mean(x): # np.mean(X, axis = 0)
# return sum(x)/len(x)
# def std(x): # np.std(X, axis = 0)
# return (sum((i - mean(x))**2 for i in x)/len(x))**0.5
# def Standardize_data(X):
# return (X - mean(X))/std(X)
print("\nBegin PCA using scikit ")
np.set_printoptions(precision=4, suppress=True,
floatmode='fixed')
print("\nLoading subset of Penguin data ")
data_file = ".\\Data\\penguin_9.txt"
X = np.loadtxt(data_file, delimiter=",",
usecols=[1,2,3,4], comments="#",
dtype=np.float64)
print("Done ")
print("\nSource data: ")
print(X)
print("\nStandardizing (z-score, biased) ")
scaler = StandardScaler()
scaler.fit(X)
X_std = scaler.transform(X)
# print(scaler.mean_)
# print(np.sqrt(scaler.var_))
print("\nStandardized data: ")
print(X_std[0:3,:])
print(". . . ")
pca = PCA(n_components=4)
pca.fit(X_std)
# covar_mat = pca.get_covariance()
# print("\ncovariance matrix: ")
# print(covar_mat)
eVals = pca.explained_variance_
print("\nEigenvalues (sorted): ")
print(eVals)
print("\nEigenvectors (sorted, rows): ")
eVecs = pca.components_
print(eVecs)
pctVar = pca.explained_variance_ratio_
print("\npercent variance explained: ")
print(pctVar)
transformed = pca.transform(X_std)
print("\nTransformed data (all components): ")
print(transformed)
# compute transformed from eigenvectors
# trans_check = np.matmul(X_std, eVecs.T)
# print("\nFirst 3 transformed verify: ")
# print(trans_check[0:3,0:2])
print("\nReconstructing " + \
" original data: ");
reconstructed = np.matmul(transformed, eVecs)
reconstructed *= np.sqrt(scaler.var_)
reconstructed += scaler.mean_
print(reconstructed[0:3,:])
print(". . . ")
print("\nEnd demo ")

.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.