Simple theory: PCA reduces many features into fewer new features while keeping as much useful variation as possible. It finds the strongest directions in the data and projects points onto them.
A dataset with 1000 features is impossible to visualise. Many features are correlated — redundant. PCA (Principal Component Analysis) finds a smaller set of new features that capture most of the variance. You can go from 1000 dimensions to 50 and keep 95% of the information — making models faster, visualisation possible, and noise reduced.
How PCA works
- Step 1: Standardise data (zero mean, unit variance) — PCA is scale-sensitive
- Step 2: Compute covariance matrix Σ — captures how features vary together
- Step 3: Find eigenvectors and eigenvalues of Σ — eigenvectors are the principal component directions
- Step 4: Sort eigenvectors by eigenvalue (highest = most variance explained)
- Step 5: Project data onto top K eigenvectors → K-dimensional representation
Covariance matrix:
Σ = (1/n) × XᵀX (after mean-centering X)
Eigendecomposition:
Σ × vᵢ = λᵢ × vᵢ
vᵢ = i-th principal component direction (eigenvector)
λᵢ = variance explained by PCᵢ (eigenvalue)
Variance explained ratio:
VEᵢ = λᵢ / Σλⱼ
Projection to K dimensions:
X_reduced = X × [v₁ | v₂ | ... | vₖ]
Example (face images, 4096 pixels → PCA):
PC1 explains 42% variance (overall brightness)
PC2 explains 18% variance (left-right lighting)
PC3 explains 9% variance (facial structure)
Top 50 PCs explain 89% variance ← 98.8% compressionEigenvalues tell you how much variance each component captures
PCA in practice — worked example
When to use PCA
- High-dimensional data (images, text embeddings, genomics) → speed up downstream models
- Correlated features — models like linear regression assume independence; PCA creates orthogonal components
- Visualisation: project to PC1 vs PC2 to see cluster structure in any dataset
- Noise reduction: the last few PCs often capture noise, dropping them improves signal
- Before K-Means clustering on high-dimensional data (distances become meaningless in high-D)