A few days ago I coded up a demo of anomaly detection using principal component analysis (PCA) reconstruction error. I implemented the PCA functionality — computation of the transformed data, the principal components, and the variance explained by each component — from semi-scratch, meaning I used the NumPy linalg (linear algebra) library eig() function to compute eigenvalues and eigenvectors.
And it was good.
But in the back of my mind, I was thinking that I should have verified my semi-from-scratch implementation of PCA because PCA is very, very complex and I could have made a mistake.

The from-scratch version (left) and the scikit version (right) are identical except that some of the transformed vectors and principal components differ by a factor of -1. This doesn’t affect anything.
So I took my original from-scratch PCA anomaly detection program and swapped out the PCA implementation from the scikit sklearn.decomposition library. And as expected, the results of the scikit-based PCA program were identical to the results of the from-scratch PCA program. Almost.
My from-scratch code looks like:
import numpy as np def my_pca(X): # returns transformed X, prin components, var explained dim = len(X[0]) # n_cols means = np.mean(X, axis=0) z = X - means # avoid changing X square_m = np.dot(z.T, z) (evals, evecs) = np.linalg.eig(square_m) trans_x = np.dot(z, evecs[:,0:dim]) prin_comp = evecs.T v = np.var(trans_x, axis=0, ddof=1) sv = np.sum(v) ve = v / sv # order everything based on variance explained ordering = np.argsort(ve)[::-1] # sort order high to low trans_x = trans_x[:,ordering] prin_comp = prin_comp[ordering,:] ve = ve[ordering] return (trans_x, prin_comp, ve) X = (load data from somewhere) (trans_x, p_comp, ve) = my_pca(X)
The scikit-based code looks like:
import numpy as np import sklearn.decomposition X = (load data from somewhere) pca = sklearn.decomposition.PCA().fit(X) trans_x = pca.transform(X) p_comp = pca.components_ ve = pca.explained_variance_ratio_
All the results were identical except that the internal transformed X values and the principal components, sometimes differed by a factor of -1. As it turns out this is OK because PCA computes variances and the sign doesn’t affect variance.
The advantage of using scikit PCA is simplicity. The advantages of using PCA from scratch are 1.) you get fine-tuned control, 2.) you remove an external dependency, 3.) you aren’t using a mysterious black box.
PCA is interesting and sometimes useful, but for tasks like dimensionality reduction and reconstruction, deep neural techniques have largely replaced PCA.

PCA was developed in 1901 by famous statistician Karl Pearson. I wonder if statisticians of that era imagined today’s deep neural technologies. Three images from the movie “Things to Come” (1936) based on the novel of the same name by author H. G. Wells.
Demo code:
# pca_recon_skikit.py
# exactly replicates iris_pca_recon.py scratch version
import numpy as np
import sklearn.decomposition
def reconstructed(X, n_comp, trans_x, p_comp):
means = np.mean(X, axis=0)
result = np.dot(trans_x[:,0:n_comp], p_comp[0:n_comp,:])
result += means
return result
def recon_error(X, XX):
diff = X - XX
diff_sq = diff * diff
errs = np.sum(diff_sq, axis=1)
return errs
def main():
print("\nBegin Iris PCA reconstruction using scikit ")
np.set_printoptions(formatter={'float': '{: 0.1f}'.format})
X = np.array([
[5.1, 3.5, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[6.4, 3.2, 4.5, 1.5],
[5.7, 2.8, 4.5, 1.3],
[7.2, 3.6, 6.1, 2.5],
[6.9, 3.2, 5.7, 2.3]])
print("\nSource X: ")
print(X)
print("\nPerforming PCA computations ")
pca = sklearn.decomposition.PCA().fit(X)
trans_x = pca.transform(X)
p_comp = pca.components_
ve = pca.explained_variance_ratio_
print("Done ")
print("\nTransformed X: ")
np.set_printoptions(formatter={'float': '{: 0.4f}'.format})
print(trans_x)
print("\nPrincipal components: ")
np.set_printoptions(formatter={'float': '{: 0.4f}'.format})
print(p_comp)
print("\nVariance explained: ")
np.set_printoptions(formatter={'float': '{: 0.5f}'.format})
print(ve)
XX = reconstructed(X, 4, trans_x, p_comp)
print("\nReconstructed X using all components: ")
np.set_printoptions(formatter={'float': '{: 0.2f}'.format})
print(XX)
XX = reconstructed(X, 1, trans_x, p_comp)
print("\nReconstructed X using one component: ")
np.set_printoptions(formatter={'float': '{: 0.2f}'.format})
print(XX)
re = recon_error(X, XX)
print("\nReconstruction errors using one component: ")
np.set_printoptions(formatter={'float': '{: 0.3f}'.format})
print(re)
print("\nEnd PCA scikit ")
if __name__ == "__main__":
main()
.NET Test Automation Recipes
Software Testing
SciPy Programming Succinctly
Keras Succinctly
R Programming
2026 Visual Studio Live
2025 Summer MLADS Conference
2026 DevIntersection Conference
2025 Machine Learning Week
2025 Ai4 Conference
2026 G2E Conference
2026 iSC West Conference
You must be logged in to post a comment.