Advanced Topics - Advanced - 18 min

Learn Graph Neural Networks

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

Most neural networks operate on grids (images), sequences (text), or independent vectors (tabular). Graph Neural Networks operate on graphs — nodes connected by edges. This unlocks problems where the structure matters: social networks, molecules, road maps, knowledge bases, fraud rings, and biology. The 2024 Nobel Prize in Chemistry recognised AI work on proteins: David Baker for computational protein design, and Demis Hassabis & John Jumper for AlphaFold's structure prediction. AlphaFold is not a pure GNN — its Evoformer is closer to a transformer — but it operates on inherently graph-like residue interactions, which is exactly the kind of structured biological problem GNNs are built for.

The message-passing framework

For each node v at layer k+1:

  h_v^(k+1) = UPDATE( h_v^k ,  AGGREGATE( { h_u^k : u ∈ N(v) } ) )

AGGREGATE: how to combine neighbour features  (mean, sum, max, attention)
UPDATE:    how to mix in the node's own feature (MLP, GRU, weighted sum)

Every GNN variant is a choice of AGGREGATE + UPDATE.

Three flavours you will see

  • GCN (Graph Convolutional Network): symmetric normalised mean of neighbours. Fast, simple, dominant baseline.
  • GraphSAGE: sample a fixed number of neighbours and aggregate. Scales to graphs with millions of nodes.
  • GAT (Graph Attention Network): learn attention weights per neighbour edge — analogous to transformer attention but over a graph.

What can a GNN actually predict?

  • Node classification: which user is a bot? Each node's final embedding feeds a softmax classifier.
  • Edge prediction: which two users will become friends? Score the pair of node embeddings.
  • Graph classification: is this molecule toxic? Pool all node embeddings into one graph-level vector.
  • Node regression: how active will this user be next month?
Simple GCN layer (Kipf & Welling):
  H^(k+1) = σ( D^(-1/2) · Â · D^(-1/2) · H^k · W^k )

 = A + I  (adjacency + self-loops)
D = diagonal degree matrix
W = trainable weight per layer

Spectral GCN — a few lines, dominant on small academic graphs.

Practice questions

  1. What does one GNN layer do for each node?
  2. Which aggregation does a GCN use?
  3. What problem is graph classification?
  4. Why is over-smoothing a problem in deep GNNs?

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More Advanced Topics lessons