Deep Learning - Intermediate - 12 min

Learn Neural Network Architecture

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

A neural network is neurons organised into layers. Every neuron in one layer connects to every neuron in the next — that's a fully connected (dense) network. The architecture — how many layers, how many neurons per layer — determines what the network can and cannot learn. Too small: it can't fit the data. Too large: it memorises noise.

The three kinds of layers

  • Input layer: one neuron per feature in your data. For a 28×28 image: 784 neurons. For a house with 5 features: 5 neurons. No computation happens here — it just holds the data.
  • Hidden layers: where the network learns. Each hidden neuron computes a weighted sum of all outputs from the previous layer, adds a bias, and applies an activation function. One or more hidden layers = 'deep' network.
  • Output layer: one neuron per class (classification) or one neuron per target value (regression). Applies softmax for multi-class, sigmoid for binary, or linear for regression.

Parameters: what the network learns

Parameters in one layer:
  Weights = (neurons in previous layer) × (neurons in this layer)
  Biases  = neurons in this layer
  Total   = prev × curr + curr

Example: 3 → 4 → 2 network:
  Layer 1 (3→4): 3×4 + 4 = 16 params
  Layer 2 (4→2): 4×2 + 2 = 10 params
  Grand total   = 26 parameters

Every arrow in the diagram is a weight. More connections = more parameters = more capacity (and more data needed to train)

Depth vs width

A wider network (more neurons per layer) learns more features at the same level of abstraction. A deeper network (more layers) learns hierarchical representations — layer 1 learns edges, layer 2 learns shapes, layer 3 learns objects. In practice, depth is more valuable than width for complex tasks: a 10-layer narrow network almost always outperforms a 1-layer wide network with the same parameter count.

Rules of thumb

  • Start with 2–3 hidden layers for most tasks — more is rarely better without regularisation
  • Hidden layer width: 64–512 neurons depending on task complexity
  • First hidden layer is usually wider (captures many low-level patterns), then narrows toward the output
  • Output layer: 1 neuron (regression), 1 neuron with sigmoid (binary classification), K neurons with softmax (K-class classification)
  • If training loss stays high → network too small (underfitting) → add neurons or layers
  • If training loss low but validation loss high → network too large (overfitting) → add dropout or regularisation

Practice questions

  1. A network has architecture 4 → 6 → 3. How many parameters does the hidden layer (4→6) have?
  2. What is the purpose of hidden layers in a neural network?
  3. You double the number of neurons in every hidden layer. What roughly happens to the number of parameters between two adjacent hidden layers?
  4. Which of these best describes the difference between depth and width in a network?

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More Deep Learning lessons