Learn PCA — Dimensionality Reduction - Free Visual AI and ML Lesson

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Simple theory: PCA reduces many features into fewer new features while keeping as much useful variation as possible. It finds the strongest directions in the data and projects points onto them.

A dataset with 1000 features is impossible to visualise. Many features are correlated — redundant. PCA (Principal Component Analysis) finds a smaller set of new features that capture most of the variance. You can go from 1000 dimensions to 50 and keep 95% of the information — making models faster, visualisation possible, and noise reduced.

How PCA works

Step 1: Standardise data (zero mean, unit variance) — PCA is scale-sensitive
Step 2: Compute covariance matrix Σ — captures how features vary together
Step 3: Find eigenvectors and eigenvalues of Σ — eigenvectors are the principal component directions
Step 4: Sort eigenvectors by eigenvalue (highest = most variance explained)
Step 5: Project data onto top K eigenvectors → K-dimensional representation

Covariance matrix:
  Σ = (1/n) × XᵀX  (after mean-centering X)

Eigendecomposition:
  Σ × vᵢ = λᵢ × vᵢ
  vᵢ = i-th principal component direction (eigenvector)
  λᵢ = variance explained by PCᵢ (eigenvalue)

Variance explained ratio:
  VEᵢ = λᵢ / Σλⱼ

Projection to K dimensions:
  X_reduced = X × [v₁ | v₂ | ... | vₖ]

Example (face images, 4096 pixels → PCA):
  PC1 explains 42% variance  (overall brightness)
  PC2 explains 18% variance  (left-right lighting)
  PC3 explains  9% variance  (facial structure)
  Top 50 PCs explain 89% variance  ← 98.8% compression

Eigenvalues tell you how much variance each component captures

PCA in practice — worked example

When to use PCA

High-dimensional data (images, text embeddings, genomics) → speed up downstream models
Correlated features — models like linear regression assume independence; PCA creates orthogonal components
Visualisation: project to PC1 vs PC2 to see cluster structure in any dataset
Noise reduction: the last few PCs often capture noise, dropping them improves signal
Before K-Means clustering on high-dimensional data (distances become meaningless in high-D)

Learn PCA — Dimensionality Reduction

How PCA works

PCA in practice — worked example

When to use PCA

Practice questions

Related AI learning resources