Simple theory: A decision tree predicts by asking a sequence of simple yes/no feature questions. Each split narrows the data until the final leaf gives a class or number prediction.
A decision tree turns the training problem into a series of questions: 'Is income above $50k? Is age below 30? Is credit score above 700?' Each question splits the data into two groups. Keep splitting until each group is mostly one class. The result is a flowchart any human can follow and verify.
Gini impurity — measuring purity
At every node, the algorithm scores every possible split on every feature and picks the one that produces the purest child nodes. Gini impurity measures how mixed the classes are at a node.
Gini(node) = 1 − Σ pᵢ²
pᵢ = fraction of samples at this node belonging to class i
Examples:
Pure node (all class A): Gini = 1 − 1² = 0.00 ← perfect
50/50 split (binary class): Gini = 1 − (0.5² + 0.5²) = 0.50 ← worst
70/30 split: Gini = 1 − (0.7² + 0.3²) = 0.42Information Gain (entropy-based) gives the same splits — Gini is faster to compute
Choosing the best split
Weighted Gini after split on feature F at threshold t:
Gini_split = (n_left/n)·Gini(left) + (n_right/n)·Gini(right)
Algorithm tries every (feature, threshold) pair → picks lowest Gini_split
Example: 50 samples, split on income ≥ 40k
Left (20 samples, 18 class-0, 2 class-1): Gini = 1−(0.9²+0.1²) = 0.18
Right (30 samples, 12 class-0, 18 class-1): Gini = 1−(0.4²+0.6²) = 0.48
Gini_split = (20/50)×0.18 + (30/50)×0.48 = 0.072 + 0.288 = 0.36Lower weighted Gini = better split. Try all features and all thresholds, pick the best.
Key hyperparameters
- max_depth: limits how deep the tree grows — most important overfitting control
- min_samples_split: minimum samples required to split a node (higher = simpler tree)
- min_samples_leaf: minimum samples in any leaf (prevents tiny, noisy leaves)
- max_features: how many features to consider at each split (randomness for forests)
- criterion: 'gini' (faster) or 'entropy' (information gain) — usually similar results
When to use decision trees
- Need full interpretability (regulated industries: banking, medicine, law)
- Mixed numeric and categorical features with minimal preprocessing
- Fast inference needed (tree lookup is O(depth) — extremely fast at prediction time)
- As a building block for ensembles — Random Forests and Gradient Boosting are collections of trees
- Don't use alone for tabular ML competitions — ensembles always win