Classical ML - Advanced - 15 min

Learn PCA — Dimensionality Reduction

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

Simple theory: PCA reduces many features into fewer new features while keeping as much useful variation as possible. It finds the strongest directions in the data and projects points onto them.

A dataset with 1000 features is impossible to visualise. Many features are correlated — redundant. PCA (Principal Component Analysis) finds a smaller set of new features that capture most of the variance. You can go from 1000 dimensions to 50 and keep 95% of the information — making models faster, visualisation possible, and noise reduced.

How PCA works

  • Step 1: Standardise data (zero mean, unit variance) — PCA is scale-sensitive
  • Step 2: Compute covariance matrix Σ — captures how features vary together
  • Step 3: Find eigenvectors and eigenvalues of Σ — eigenvectors are the principal component directions
  • Step 4: Sort eigenvectors by eigenvalue (highest = most variance explained)
  • Step 5: Project data onto top K eigenvectors → K-dimensional representation
Covariance matrix:
  Σ = (1/n) × XᵀX  (after mean-centering X)

Eigendecomposition:
  Σ × vᵢ = λᵢ × vᵢ
  vᵢ = i-th principal component direction (eigenvector)
  λᵢ = variance explained by PCᵢ (eigenvalue)

Variance explained ratio:
  VEᵢ = λᵢ / Σλⱼ

Projection to K dimensions:
  X_reduced = X × [v₁ | v₂ | ... | vₖ]

Example (face images, 4096 pixels → PCA):
  PC1 explains 42% variance  (overall brightness)
  PC2 explains 18% variance  (left-right lighting)
  PC3 explains  9% variance  (facial structure)
  Top 50 PCs explain 89% variance  ← 98.8% compression

Eigenvalues tell you how much variance each component captures

PCA in practice — worked example

When to use PCA

  • High-dimensional data (images, text embeddings, genomics) → speed up downstream models
  • Correlated features — models like linear regression assume independence; PCA creates orthogonal components
  • Visualisation: project to PC1 vs PC2 to see cluster structure in any dataset
  • Noise reduction: the last few PCs often capture noise, dropping them improves signal
  • Before K-Means clustering on high-dimensional data (distances become meaningless in high-D)

Practice questions

  1. What does the first principal component (PC1) represent?
  2. After PCA, your first 5 components explain 92% of variance. What does this mean?
  3. PCA is said to be an 'unsupervised' technique because:
  4. After applying PCA, the transformed features are:

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More Classical ML lessons