Learn K-Means Clustering

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Simple theory: K-means is an unsupervised clustering algorithm. It groups nearby points by repeatedly assigning points to the nearest center and moving each center to the middle of its group.

K-Means is unsupervised — no labels needed. Give it a pile of unlabelled data and it will find natural groups. Customer transactions become 'budget buyers', 'premium buyers', 'deal hunters'. Genes become functional clusters. News articles become topic groups. The algorithm finds structure; you interpret what it means.

The algorithm step by step

Step 1: Choose K — the number of clusters (the key hyperparameter you must set)
Step 2: Initialise K centroids randomly (or with K-Means++ for better starts)
Step 3: Assignment — assign each point xᵢ to the nearest centroid: cluster(xᵢ) = argmin‖xᵢ − μₖ‖²
Step 4: Update — move each centroid to the mean of its assigned points: μₖ = (1/|Cₖ|) Σ xᵢ∈Cₖ xᵢ
Step 5: Repeat steps 3–4 until assignments don't change (convergence)

K-Means objective (inertia / within-cluster sum of squares):
  J = Σₖ Σ_{xᵢ∈Cₖ} ‖xᵢ − μₖ‖²

where:
  K = number of clusters
  Cₖ = set of points assigned to cluster k
  μₖ = centroid of cluster k (mean of all points in Cₖ)

Example (2D, 3 clusters after convergence):
  Cluster 1 centroid: μ₁ = (2.1, 3.4),  14 points,  inertia contribution = 8.2
  Cluster 2 centroid: μ₂ = (7.8, 1.2),  11 points,  inertia contribution = 5.9
  Cluster 3 centroid: μ₃ = (5.0, 8.6),  10 points,  inertia contribution = 6.7
  Total inertia J = 20.8

Lower inertia = tighter, better-separated clusters. But J always decreases as K increases.

Choosing K — the elbow method

Plot inertia vs K. Inertia always decreases as K increases (more clusters = each cluster smaller = less spread). Look for the 'elbow' — the K where the rate of decrease sharply changes. Beyond the elbow, adding more clusters gives diminishing returns. There's no perfect automated method, but the elbow is a good starting heuristic.

Limitations of K-Means

Assumes spherical clusters — fails on elongated, crescent, or unequal-density clusters
K must be specified upfront — bad K gives meaningless clusters
Sensitive to outliers — one extreme point can pull a centroid far off
Different runs give different results (local minima) — always run multiple times (n_init=10)
Doesn't work well with high-dimensional data — distances become meaningless (curse of dimensionality)
Use DBSCAN for density-based clusters, Gaussian Mixture Models for soft cluster assignments

The algorithm step by step

Choosing K — the elbow method

Limitations of K-Means

Practice questions

Related AI learning resources