A model is a mirror of its training data and its designer's choices. If the data reflects historical discrimination, the model perpetuates it — only faster, at scale, and harder to challenge. AI ethics is not philosophical hand-waving: it is the engineering discipline of catching unfair outcomes before they ship, documenting trade-offs honestly, and giving humans the tools to override the machine.
Where bias comes from
- Historical bias — the world the data came from was already unequal.
- Sampling bias — some groups are over- or under-represented in training data.
- Measurement bias — the label is a proxy (arrest rate ≠ crime rate; clicks ≠ value).
- Aggregation bias — one model for many groups when the right model differs per group.
- Deployment bias — model used in a context different from where it was trained (e.g. hospital A → hospital B).
- Feedback loop bias — model predictions shape future data (predictive policing sends officers to areas the model flagged, generating more arrests there, reinforcing the model).
Common fairness definitions
- Demographic parity: accept rate is equal across groups. P(ŷ=1 | A=0) = P(ŷ=1 | A=1).
- Equal opportunity: true-positive rate is equal across groups (qualified applicants are equally likely to be accepted).
- Equalised odds: both TPR and FPR are equal across groups.
- Disparate impact ratio: min-group rate / max-group rate ≥ 0.80 (US legal '80% rule').
- Counterfactual fairness: would this individual receive the same decision in a world where their group attribute were different?
Disparate impact (DI):
DI = P(ŷ = 1 | A = unprivileged)
--------------------------------
P(ŷ = 1 | A = privileged)
DI ≥ 0.80 → passes the 80% rule
DI < 0.80 → evidence of disparate impact under US case lawA common starting point — but no single metric captures fairness fully.
Practical mitigations
- Pre-processing: rebalance the training set, remove proxy features, generate synthetic minority samples.
- In-processing: add a fairness regularisation term to the loss function.
- Post-processing: adjust the decision threshold per group to equalise TPR or FPR.
- Audit + monitor: measure fairness metrics in production, not just at training time.
- Model cards & datasheets: document intended use, known limitations, demographic performance breakdowns.
- Human-in-the-loop: high-stakes decisions (loans, parole, medical) keep a human reviewer in the chain.