Classical ML - Beginner - 8 min

Learn Train / Test Split

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

Simple theory: A train/test split separates examples used for learning from examples used for checking. The goal is to measure whether the model can handle new data, not just memorize the data it already saw.

You've built a model and it scores 98% on your data. Impressive — until you realise you tested it on the exact same data you trained it on. The model didn't learn; it memorised. The train/test split is the fundamental safeguard that gives you an honest answer: does this model actually work on new data it's never seen?

The three-way split

Dataset (N total samples)
  └── Train  70–80%  ← model fits weights on this
  └── Val    10–15%  ← tune hyperparameters, pick best model
  └── Test   10–15%  ← evaluate ONCE at the end, never during dev

With N = 10,000 samples:
  Train = 7,500   Val = 1,250   Test = 1,250

Use stratified splitting to preserve class ratios in each set

  • Train set: the only data the model sees during fitting — weights are updated using this
  • Validation set: used to compare model variants, tune learning rate, depth, regularisation
  • Test set: touched exactly once — after all development is done — to report final performance
  • k-Fold Cross-Validation: rotates the validation window k times for more reliable estimates on small datasets
  • Stratified split: ensures each set has the same class proportions as the full dataset (crucial for imbalanced data)

k-Fold Cross-Validation

With small datasets (< 10,000 samples), a single random split gives unstable estimates — different random seeds give different results. k-Fold (typically k=5 or k=10) solves this: split data into k equal folds, train k models each using a different fold as validation, average all k scores. More reliable, but k× slower to run.

k-Fold CV score = mean(score_fold_1, score_fold_2, ..., score_fold_k)

Example: 5-Fold CV on 5,000 samples
  Fold 1: Train on 4,000, Val on 1,000 → acc = 84.2%
  Fold 2: Train on 4,000, Val on 1,000 → acc = 83.1%
  Fold 3: Train on 4,000, Val on 1,000 → acc = 85.5%
  Fold 4: Train on 4,000, Val on 1,000 → acc = 82.8%
  Fold 5: Train on 4,000, Val on 1,000 → acc = 84.9%
  Final CV score = 84.1% ± 1.0%  (mean ± std)

The ± gives you confidence in the estimate — a large std means the score is unreliable

Data leakage — the silent killer

Data leakage is when information from outside the training set sneaks into your model during development, making performance look better than it is in production.

Practice questions

  1. Why must the test set never be used during model development?
  2. You are building a model for stock price prediction. Which split strategy is most appropriate?
  3. Your model scores 99% on training data and 72% on test data. What does this indicate?
  4. What is the purpose of the validation set (separate from the test set)?

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More Classical ML lessons