Most neural networks operate on grids (images), sequences (text), or independent vectors (tabular). Graph Neural Networks operate on graphs — nodes connected by edges. This unlocks problems where the structure matters: social networks, molecules, road maps, knowledge bases, fraud rings, and biology. The 2024 Nobel Prize in Chemistry recognised AI work on proteins: David Baker for computational protein design, and Demis Hassabis & John Jumper for AlphaFold's structure prediction. AlphaFold is not a pure GNN — its Evoformer is closer to a transformer — but it operates on inherently graph-like residue interactions, which is exactly the kind of structured biological problem GNNs are built for.
The message-passing framework
For each node v at layer k+1:
h_v^(k+1) = UPDATE( h_v^k , AGGREGATE( { h_u^k : u ∈ N(v) } ) )
AGGREGATE: how to combine neighbour features (mean, sum, max, attention)
UPDATE: how to mix in the node's own feature (MLP, GRU, weighted sum)Every GNN variant is a choice of AGGREGATE + UPDATE.
Three flavours you will see
- GCN (Graph Convolutional Network): symmetric normalised mean of neighbours. Fast, simple, dominant baseline.
- GraphSAGE: sample a fixed number of neighbours and aggregate. Scales to graphs with millions of nodes.
- GAT (Graph Attention Network): learn attention weights per neighbour edge — analogous to transformer attention but over a graph.
What can a GNN actually predict?
- Node classification: which user is a bot? Each node's final embedding feeds a softmax classifier.
- Edge prediction: which two users will become friends? Score the pair of node embeddings.
- Graph classification: is this molecule toxic? Pool all node embeddings into one graph-level vector.
- Node regression: how active will this user be next month?
Simple GCN layer (Kipf & Welling):
H^(k+1) = σ( D^(-1/2) · Â · D^(-1/2) · H^k · W^k )
 = A + I (adjacency + self-loops)
D = diagonal degree matrix
W = trainable weight per layerSpectral GCN — a few lines, dominant on small academic graphs.