Learn AI Ethics & Bias

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

A model is a mirror of its training data and its designer's choices. If the data reflects historical discrimination, the model perpetuates it — only faster, at scale, and harder to challenge. AI ethics is not philosophical hand-waving: it is the engineering discipline of catching unfair outcomes before they ship, documenting trade-offs honestly, and giving humans the tools to override the machine.

Where bias comes from

Historical bias — the world the data came from was already unequal.
Sampling bias — some groups are over- or under-represented in training data.
Measurement bias — the label is a proxy (arrest rate ≠ crime rate; clicks ≠ value).
Aggregation bias — one model for many groups when the right model differs per group.
Deployment bias — model used in a context different from where it was trained (e.g. hospital A → hospital B).
Feedback loop bias — model predictions shape future data (predictive policing sends officers to areas the model flagged, generating more arrests there, reinforcing the model).

Common fairness definitions

Demographic parity: accept rate is equal across groups. P(ŷ=1 | A=0) = P(ŷ=1 | A=1).
Equal opportunity: true-positive rate is equal across groups (qualified applicants are equally likely to be accepted).
Equalised odds: both TPR and FPR are equal across groups.
Disparate impact ratio: min-group rate / max-group rate ≥ 0.80 (US legal '80% rule').
Counterfactual fairness: would this individual receive the same decision in a world where their group attribute were different?

Disparate impact (DI):

  DI = P(ŷ = 1 | A = unprivileged)
       --------------------------------
       P(ŷ = 1 | A = privileged)

  DI ≥ 0.80   →   passes the 80% rule
  DI < 0.80   →   evidence of disparate impact under US case law

A common starting point — but no single metric captures fairness fully.

Practical mitigations

Pre-processing: rebalance the training set, remove proxy features, generate synthetic minority samples.
In-processing: add a fairness regularisation term to the loss function.
Post-processing: adjust the decision threshold per group to equalise TPR or FPR.
Audit + monitor: measure fairness metrics in production, not just at training time.
Model cards & datasheets: document intended use, known limitations, demographic performance breakdowns.
Human-in-the-loop: high-stakes decisions (loans, parole, medical) keep a human reviewer in the chain.

Where bias comes from

Common fairness definitions

Practical mitigations

Practice questions

Related AI learning resources