Learn Graph Neural Networks

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Most neural networks operate on grids (images), sequences (text), or independent vectors (tabular). Graph Neural Networks operate on graphs — nodes connected by edges. This unlocks problems where the structure matters: social networks, molecules, road maps, knowledge bases, fraud rings, and biology. The 2024 Nobel Prize in Chemistry recognised AI work on proteins: David Baker for computational protein design, and Demis Hassabis & John Jumper for AlphaFold's structure prediction. AlphaFold is not a pure GNN — its Evoformer is closer to a transformer — but it operates on inherently graph-like residue interactions, which is exactly the kind of structured biological problem GNNs are built for.

The message-passing framework

For each node v at layer k+1:

  h_v^(k+1) = UPDATE( h_v^k ,  AGGREGATE( { h_u^k : u ∈ N(v) } ) )

AGGREGATE: how to combine neighbour features  (mean, sum, max, attention)
UPDATE:    how to mix in the node's own feature (MLP, GRU, weighted sum)

Every GNN variant is a choice of AGGREGATE + UPDATE.

Three flavours you will see

GCN (Graph Convolutional Network): symmetric normalised mean of neighbours. Fast, simple, dominant baseline.
GraphSAGE: sample a fixed number of neighbours and aggregate. Scales to graphs with millions of nodes.
GAT (Graph Attention Network): learn attention weights per neighbour edge — analogous to transformer attention but over a graph.

What can a GNN actually predict?

Node classification: which user is a bot? Each node's final embedding feeds a softmax classifier.
Edge prediction: which two users will become friends? Score the pair of node embeddings.
Graph classification: is this molecule toxic? Pool all node embeddings into one graph-level vector.
Node regression: how active will this user be next month?

Simple GCN layer (Kipf & Welling):
  H^(k+1) = σ( D^(-1/2) · Â · D^(-1/2) · H^k · W^k )

Â = A + I  (adjacency + self-loops)
D = diagonal degree matrix
W = trainable weight per layer

Spectral GCN — a few lines, dominant on small academic graphs.

The message-passing framework

Three flavours you will see

What can a GNN actually predict?

Practice questions

Related AI learning resources