Before you train a single model, plot the data. Every modeling bug, every data leak, every class imbalance is visible in a chart long before it shows up as a low score. Data viz is your first line of defense against shipping a broken model — and your last step before presenting results to humans.
Picking the right chart
Each chart type answers a different question. Pick the wrong one and you'll either miss insights or actively mislead. The hierarchy: bar for comparing categories, line for tracking time, scatter for relationships between two variables, histogram for distribution of one variable, heatmap for a 2D matrix of values.
- Bar — compare categories (sales by region, A/B test outcomes)
- Line — show a trend over time (loss curve, daily users)
- Scatter — relationship between two variables (height vs weight, feature correlations)
- Histogram — distribution of one variable (age, predicted probabilities)
- Box plot — five-number summary, compare distributions side by side
- Heatmap — 2D values (confusion matrix, correlation matrix, attention weights)
- Pair plot — every feature against every other (great for EDA)
The Python toolkit
Matplotlib is the foundation — verbose but powerful, every other library builds on it. Seaborn is a higher-level wrapper that makes statistical plots beautiful by default. Plotly produces interactive, web-ready charts (great for dashboards). Pick based on need: matplotlib for papers/reports, seaborn for EDA, plotly for live products.