Learn Train / Test Split

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Simple theory: A train/test split separates examples used for learning from examples used for checking. The goal is to measure whether the model can handle new data, not just memorize the data it already saw.

You've built a model and it scores 98% on your data. Impressive — until you realise you tested it on the exact same data you trained it on. The model didn't learn; it memorised. The train/test split is the fundamental safeguard that gives you an honest answer: does this model actually work on new data it's never seen?

The three-way split

Dataset (N total samples)
  └── Train  70–80%  ← model fits weights on this
  └── Val    10–15%  ← tune hyperparameters, pick best model
  └── Test   10–15%  ← evaluate ONCE at the end, never during dev

With N = 10,000 samples:
  Train = 7,500   Val = 1,250   Test = 1,250

Use stratified splitting to preserve class ratios in each set

Train set: the only data the model sees during fitting — weights are updated using this
Validation set: used to compare model variants, tune learning rate, depth, regularisation
Test set: touched exactly once — after all development is done — to report final performance
k-Fold Cross-Validation: rotates the validation window k times for more reliable estimates on small datasets
Stratified split: ensures each set has the same class proportions as the full dataset (crucial for imbalanced data)

k-Fold Cross-Validation

With small datasets (< 10,000 samples), a single random split gives unstable estimates — different random seeds give different results. k-Fold (typically k=5 or k=10) solves this: split data into k equal folds, train k models each using a different fold as validation, average all k scores. More reliable, but k× slower to run.

k-Fold CV score = mean(score_fold_1, score_fold_2, ..., score_fold_k)

Example: 5-Fold CV on 5,000 samples
  Fold 1: Train on 4,000, Val on 1,000 → acc = 84.2%
  Fold 2: Train on 4,000, Val on 1,000 → acc = 83.1%
  Fold 3: Train on 4,000, Val on 1,000 → acc = 85.5%
  Fold 4: Train on 4,000, Val on 1,000 → acc = 82.8%
  Fold 5: Train on 4,000, Val on 1,000 → acc = 84.9%
  Final CV score = 84.1% ± 1.0%  (mean ± std)

The ± gives you confidence in the estimate — a large std means the score is unreliable

Data leakage — the silent killer

Data leakage is when information from outside the training set sneaks into your model during development, making performance look better than it is in production.

The three-way split

k-Fold Cross-Validation

Data leakage — the silent killer

Practice questions

Related AI learning resources