Classical ML - Intermediate - 12 min

Learn K-Means Clustering

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

Simple theory: K-means is an unsupervised clustering algorithm. It groups nearby points by repeatedly assigning points to the nearest center and moving each center to the middle of its group.

K-Means is unsupervised — no labels needed. Give it a pile of unlabelled data and it will find natural groups. Customer transactions become 'budget buyers', 'premium buyers', 'deal hunters'. Genes become functional clusters. News articles become topic groups. The algorithm finds structure; you interpret what it means.

The algorithm step by step

  • Step 1: Choose K — the number of clusters (the key hyperparameter you must set)
  • Step 2: Initialise K centroids randomly (or with K-Means++ for better starts)
  • Step 3: Assignment — assign each point xᵢ to the nearest centroid: cluster(xᵢ) = argmin‖xᵢ − μₖ‖²
  • Step 4: Update — move each centroid to the mean of its assigned points: μₖ = (1/|Cₖ|) Σ xᵢ∈Cₖ xᵢ
  • Step 5: Repeat steps 3–4 until assignments don't change (convergence)
K-Means objective (inertia / within-cluster sum of squares):
  J = Σₖ Σ_{xᵢ∈Cₖ} ‖xᵢ − μₖ‖²

where:
  K = number of clusters
  Cₖ = set of points assigned to cluster k
  μₖ = centroid of cluster k (mean of all points in Cₖ)

Example (2D, 3 clusters after convergence):
  Cluster 1 centroid: μ₁ = (2.1, 3.4),  14 points,  inertia contribution = 8.2
  Cluster 2 centroid: μ₂ = (7.8, 1.2),  11 points,  inertia contribution = 5.9
  Cluster 3 centroid: μ₃ = (5.0, 8.6),  10 points,  inertia contribution = 6.7
  Total inertia J = 20.8

Lower inertia = tighter, better-separated clusters. But J always decreases as K increases.

Choosing K — the elbow method

Plot inertia vs K. Inertia always decreases as K increases (more clusters = each cluster smaller = less spread). Look for the 'elbow' — the K where the rate of decrease sharply changes. Beyond the elbow, adding more clusters gives diminishing returns. There's no perfect automated method, but the elbow is a good starting heuristic.

Limitations of K-Means

  • Assumes spherical clusters — fails on elongated, crescent, or unequal-density clusters
  • K must be specified upfront — bad K gives meaningless clusters
  • Sensitive to outliers — one extreme point can pull a centroid far off
  • Different runs give different results (local minima) — always run multiple times (n_init=10)
  • Doesn't work well with high-dimensional data — distances become meaningless (curse of dimensionality)
  • Use DBSCAN for density-based clusters, Gaussian Mixture Models for soft cluster assignments

Practice questions

  1. In K-Means, what is a 'centroid'?
  2. K-Means clustering requires you to specify K in advance. What is the main risk of choosing K too large?
  3. K-Means assigns each point to the nearest centroid using:
  4. Why does K-Means++ initialisation produce better results than random initialisation?

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More Classical ML lessons