Advanced Topics - Intermediate - 12 min

Learn AI Ethics & Bias

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

Last updated: 2026-05-13.

A model is a mirror of its training data and its designer's choices. If the data reflects historical discrimination, the model perpetuates it — only faster, at scale, and harder to challenge. AI ethics is not philosophical hand-waving: it is the engineering discipline of catching unfair outcomes before they ship, documenting trade-offs honestly, and giving humans the tools to override the machine.

Where bias comes from

  • Historical bias — the world the data came from was already unequal.
  • Sampling bias — some groups are over- or under-represented in training data.
  • Measurement bias — the label is a proxy (arrest rate ≠ crime rate; clicks ≠ value).
  • Aggregation bias — one model for many groups when the right model differs per group.
  • Deployment bias — model used in a context different from where it was trained (e.g. hospital A → hospital B).
  • Feedback loop bias — model predictions shape future data (predictive policing sends officers to areas the model flagged, generating more arrests there, reinforcing the model).

Common fairness definitions

  • Demographic parity: accept rate is equal across groups. P(ŷ=1 | A=0) = P(ŷ=1 | A=1).
  • Equal opportunity: true-positive rate is equal across groups (qualified applicants are equally likely to be accepted).
  • Equalised odds: both TPR and FPR are equal across groups.
  • Disparate impact ratio: min-group rate / max-group rate ≥ 0.80 (US legal '80% rule').
  • Counterfactual fairness: would this individual receive the same decision in a world where their group attribute were different?
Disparate impact (DI):

  DI = P(ŷ = 1 | A = unprivileged)
       --------------------------------
       P(ŷ = 1 | A = privileged)

  DI ≥ 0.80   →   passes the 80% rule
  DI < 0.80   →   evidence of disparate impact under US case law

A common starting point — but no single metric captures fairness fully.

Practical mitigations

  • Pre-processing: rebalance the training set, remove proxy features, generate synthetic minority samples.
  • In-processing: add a fairness regularisation term to the loss function.
  • Post-processing: adjust the decision threshold per group to equalise TPR or FPR.
  • Audit + monitor: measure fairness metrics in production, not just at training time.
  • Model cards & datasheets: document intended use, known limitations, demographic performance breakdowns.
  • Human-in-the-loop: high-stakes decisions (loans, parole, medical) keep a human reviewer in the chain.

Practice questions

  1. Why does removing race/gender from training data not eliminate bias?
  2. What does the 80% rule check?
  3. What is feedback loop bias?
  4. Why can't a single metric guarantee fairness?

Related AI learning resources

Premium lesson notes and simulations | AI project templates | More Advanced Topics lessons