Generative AI - Advanced - 18 min

Learn Diffusion Models

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

Diffusion models are the engine behind DALL-E, Stable Diffusion, Midjourney, and Sora. The idea is brilliantly counterintuitive: train a model to reverse the process of slowly adding noise to an image. Once trained, you can start from pure noise and run the reverse process to generate a new image. Unlike GANs, training is stable. Unlike VAEs, output quality is photoreal. The cost: hundreds of forward passes to generate one image (vs. one for GANs).

Forward process — simple, no learning

Add Gaussian noise at each step:
  x_t = √(1 − β_t) · x_{t−1} + √β_t · ε   where ε ~ N(0, 1)

After T = 1000 steps with small β_t, x_T is essentially pure noise.
Closed form: x_t = √(α̃_t) · x_0 + √(1 − α̃_t) · ε   (α̃_t depends on β schedule).

Forward = mechanical · no learning needed

Reverse process — train a UNet

Train a network ε_θ(x_t, t) to predict the noise that was added at step t.

  Loss = || ε − ε_θ(x_t, t) ||²

At sampling time:
  • Start: x_T ~ N(0, 1)   (pure noise)
  • For t = T, T−1, ..., 1:
      predicted noise = ε_θ(x_t, t)
      x_{t-1} = (x_t − scaled_noise) / scale  +  small Gaussian
  • Output: x_0   (sampled image)

Neural net: usually a UNet with attention layers + time conditioning.
T = 1000 in the original paper; modern samplers (DDIM, DPM-Solver) reach quality in 20-50 steps.

Predict the noise · subtract a bit · iterate

Why diffusion works so well

  • Stable training: just predict noise — no two-player game like GANs.
  • High fidelity: small changes per step, error doesn't compound badly.
  • Diversity: different noise seeds → different images.
  • Conditioning: add text, class, or image guidance to ε_θ → DALL-E, Stable Diffusion.
  • Slow inference: each image needs N forward passes through the UNet (mitigated by latent diffusion + faster solvers).

Practice questions

  1. What is the network in a diffusion model trained to predict?
  2. What is the forward process in diffusion?
  3. Why are diffusion models slower than GANs at generation?
  4. What's the key insight of latent diffusion (Stable Diffusion)?

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More Generative AI lessons