Computer Vision - Advanced - 15 min

Learn Object Detection

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

Image classification asks 'what's in this picture?' Object detection asks something harder: 'what's in it AND where is each thing?' For every object the model finds, it must output a class label, a confidence score, AND a bounding box (x, y, width, height) that tightly wraps the object. Modern detectors run in real-time on phones — every face on a Zoom call, every car a Tesla sees, every defect a factory camera flags is a detection.

Two Generations of Detectors

  • Two-stage (Faster R-CNN, 2015): Stage 1 proposes ~2,000 candidate regions where objects might be. Stage 2 classifies each region and refines its box. More accurate but slower.
  • One-stage (YOLO, SSD, RetinaNet): predicts boxes + classes in a single forward pass over a grid of anchor positions. Fast (60+ FPS on a phone) but historically slightly less accurate. The gap has closed since 2020.
  • Modern (DETR, 2020+): treats detection as a SET prediction problem with a transformer. No anchors, no NMS — the model directly outputs the final list of detections.

The YOLO Idea — Detection as a Single Pass

YOLO (You Only Look Once) divides the image into an S×S grid (e.g., 13×13).
Each grid cell predicts B bounding boxes, plus class probabilities:

  Per grid cell, output:
    For each of B boxes:
      • (x, y, w, h)         — box coordinates relative to cell
      • confidence            — probability the box contains AN object × IoU
    Per cell:
      • class_probs[C]        — probability for each of C classes

Final tensor shape: [S, S, B × 5 + C]

For the COCO dataset: S=13, B=3, C=80 → [13, 13, 95] = 16,055 numbers per image.
A single forward pass produces all detections — extremely fast.

Trick: predictions go through Non-Maximum Suppression (NMS) to remove
duplicate boxes covering the same object.

One forward pass · per-cell predictions · NMS to deduplicate

Bounding Box Coordinates

Two common formats:

  Corner format:  (x_min, y_min, x_max, y_max)
  Center format:  (x_center, y_center, width, height)

Networks output center format (easier for regression), but converted to corner format for IoU calculations and visualization.

Boxes are usually normalised to [0, 1] coordinates (image-relative)
so the model is invariant to image size at inference.

Both formats are equivalent — choice depends on convenience

Intersection-over-Union (IoU): the Detection Yardstick

IoU is THE metric for evaluating box predictions:

  IoU(A, B) = area(A ∩ B) / area(A ∪ B)

Values:
  • IoU = 0: no overlap
  • IoU = 1: boxes are identical
  • IoU > 0.5: typically considered a 'correct' detection
  • IoU > 0.75: high precision

Used for:
  1. Training: matching predicted boxes to ground truth
  2. Non-Maximum Suppression: removing duplicate predictions
  3. Evaluation: mAP@0.5, mAP@0.5:0.95 metrics

IoU is the contract between predictions and ground-truth boxes

Non-Maximum Suppression (NMS): Removing Duplicates

  • Problem: a single object often gets multiple predicted boxes (slight variations in position).
  • Solution NMS: 1) Sort predictions by confidence (highest first). 2) Take the top one, add to final list. 3) Remove all remaining boxes whose IoU with the kept one exceeds a threshold (typically 0.5). 4) Repeat until empty.
  • Result: each object ends up with one box — its highest-confidence prediction.
  • Per class: NMS is applied independently for each class so two different objects can overlap (e.g., person on bicycle).

Practice questions

  1. What is the key difference between image classification and object detection?
  2. What does Intersection-over-Union (IoU) measure?
  3. Why is Non-Maximum Suppression (NMS) needed in object detection?
  4. What is the main advantage of one-stage detectors like YOLO over two-stage ones like Faster R-CNN?

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More Computer Vision lessons