A Generative Adversarial Network is a two-player game. The Generator takes noise and produces fake images. The Discriminator looks at real and fake images and tries to tell them apart. They're trained simultaneously: the Generator improves to fool the Discriminator, and the Discriminator improves to catch the Generator. At equilibrium, the Generator produces images indistinguishable from real ones.
Math
Generator G: noise z → fake image G(z)
Discriminator D: image → P(real)
Minimax loss:
min_G max_D E[log D(x_real)] + E[log(1 − D(G(z)))]
In practice, alternate updates:
• Update D to maximise (D thinks real is real, fake is fake)
• Update G to maximise log D(G(z)) (G wants D to think fake is real)
Nash equilibrium: D outputs 0.5 for everything, G generates samples from real data distribution.Two networks · opposing losses · iteratively trained
Why GANs are tricky
- Mode collapse: generator finds one type of output that fools D, then produces only that. Diversity collapses.
- Training instability: gradients can be wild, networks can diverge. Tricks (Wasserstein loss, gradient penalty, spectral normalisation) help.
- No likelihood: can't directly evaluate the probability of an image — only generate samples.
- Hyperparameter sensitivity: learning rate, batch size, architecture all matter a lot.
Notable GAN milestones
- DCGAN (2015): first to produce coherent 64×64 images of faces, bedrooms.
- Progressive GAN (2017): grow resolution during training, hit 1024×1024 photorealism.
- StyleGAN (2018-2020): per-layer style injection produces controllable, lifelike faces (thispersondoesnotexist.com).
- BigGAN (2018): class-conditional, ImageNet-quality photoreal generation at scale.
- After ~2022, diffusion models took over for high-quality generation, but GANs remain relevant for fast inference and style-consistent generation.