Learn Neural Network Architecture

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

A neural network is neurons organised into layers. Every neuron in one layer connects to every neuron in the next — that's a fully connected (dense) network. The architecture — how many layers, how many neurons per layer — determines what the network can and cannot learn. Too small: it can't fit the data. Too large: it memorises noise.

The three kinds of layers

Input layer: one neuron per feature in your data. For a 28×28 image: 784 neurons. For a house with 5 features: 5 neurons. No computation happens here — it just holds the data.
Hidden layers: where the network learns. Each hidden neuron computes a weighted sum of all outputs from the previous layer, adds a bias, and applies an activation function. One or more hidden layers = 'deep' network.
Output layer: one neuron per class (classification) or one neuron per target value (regression). Applies softmax for multi-class, sigmoid for binary, or linear for regression.

Parameters: what the network learns

Parameters in one layer:
  Weights = (neurons in previous layer) × (neurons in this layer)
  Biases  = neurons in this layer
  Total   = prev × curr + curr

Example: 3 → 4 → 2 network:
  Layer 1 (3→4): 3×4 + 4 = 16 params
  Layer 2 (4→2): 4×2 + 2 = 10 params
  Grand total   = 26 parameters

Every arrow in the diagram is a weight. More connections = more parameters = more capacity (and more data needed to train)

Depth vs width

A wider network (more neurons per layer) learns more features at the same level of abstraction. A deeper network (more layers) learns hierarchical representations — layer 1 learns edges, layer 2 learns shapes, layer 3 learns objects. In practice, depth is more valuable than width for complex tasks: a 10-layer narrow network almost always outperforms a 1-layer wide network with the same parameter count.

Rules of thumb

Start with 2–3 hidden layers for most tasks — more is rarely better without regularisation
Hidden layer width: 64–512 neurons depending on task complexity
First hidden layer is usually wider (captures many low-level patterns), then narrows toward the output
Output layer: 1 neuron (regression), 1 neuron with sigmoid (binary classification), K neurons with softmax (K-class classification)
If training loss stays high → network too small (underfitting) → add neurons or layers
If training loss low but validation loss high → network too large (overfitting) → add dropout or regularisation