Learn Pandas — DataFrames - Free Visual AI and ML Lesson

A free visual AI and machine learning lesson with an interactive 3D visualization, plain-English theory, and quiz.

If NumPy is for numerical arrays, pandas is for labeled tabular data — the kind of mixed-type, named-column data that 80% of real ML work involves. CSV in, CSV out, with cleaning, joining, grouping, and aggregating in between. Almost every ML pipeline starts in pandas long before it touches a model.

Series and DataFrame

A Series is a 1D array with a label — like one column. A DataFrame is a 2D table: rows × columns, where each column is a Series with its own dtype. Each column has a name, each row has an index. Pandas operations are column-aware: filter, sort, group by column name and let pandas figure out the rest.

df.head()                  # first 5 rows
df[df.age > 30]            # filter rows
df.groupby('city').mean()  # aggregate by group
df.merge(other, on='id')   # SQL-style join
df['col'].apply(fn)        # transform a column
df.pivot_table(...)        # cross-tabulate

Core pandas verbs — covers ~90% of daily data work

The split-apply-combine pattern

The most powerful pandas pattern is `groupby().agg()`. Split the data by some key (e.g. by city), apply a function to each group (e.g. mean), then combine the results into a new table. Once you can think in split-apply-combine, half your data wrangling problems disappear.

Read/write: `pd.read_csv`, `read_parquet`, `read_sql`, `to_csv`, `to_parquet`
Filter: `df[df.col > 10]`, `df.query('age > 30 and city == "NYC"')`
Group: `df.groupby('city').mean()`, `.agg({'col': ['min', 'max']})`
Reshape: `df.pivot`, `df.melt`, `df.stack`, `df.unstack`
Time series: `pd.to_datetime`, `df.resample('D').mean()`, `.rolling(7).mean()`

Learn Pandas — DataFrames

Series and DataFrame

The split-apply-combine pattern

Practice questions

Related AI learning resources