Python Mastery — From Zero to AI Engineering

Lesson 11

Data Visualization — Matplotlib, Seaborn & Interactive Charts

32 min

Why Visualization Matters in Data Science

A chart can communicate in seconds what a table of numbers cannot communicate in minutes. But bad charts can also mislead — the same dataset can "show" opposite conclusions depending on axis limits, bin sizes, and color choices. This lesson teaches you to produce honest, clear, and beautiful visualizations.

The Python visualization ecosystem:

matplotlib   — foundational; all other libraries build on it
seaborn      — statistical graphics on top of matplotlib; integrates with DataFrames
plotly       — interactive charts for dashboards and web apps
altair       — declarative grammar-of-graphics (Vega-Lite based)
bokeh        — interactive plots for the browser, with streaming support

We focus on matplotlib + seaborn: they cover 95% of production data science needs, and understanding them makes every other library easier to learn.

Matplotlib Architecture: The Artist Hierarchy

Matplotlib has two APIs. Most tutorials mix them up, creating confusion. Understanding the architecture makes both clear.

Every matplotlib plot is a tree of Artist objects:

Figure                  — the entire canvas (window, PNG, PDF, SVG)
├── FigureTitle (Text)
└── Axes                — one plot region with coordinate system
    ├── XAxis           — x-axis (spine, ticks, tick labels, axis label)
    │   ├── Spine
    │   ├── XTick (×N)
    │   └── Text (axis label)
    ├── YAxis           — same structure
    ├── Title (Text)
    ├── Line2D (×N)     — line plots
    ├── PathCollection  — scatter plots (collection of markers)
    ├── Rectangle (×N)  — bar charts
    ├── Image           — imshow, heatmaps
    └── Text (×N)       — annotations, labels

Two APIs, one hierarchy:

The pyplot interface (plt.plot(...), plt.title(...)) is a state machine that tracks the "current figure" and "current axes". It operates implicitly on the most recently created Axes. This is convenient for one-off plots in a Jupyter cell but dangerous in scripts.

The object-oriented (OO) API (fig, ax = plt.subplots()) gives you explicit handles. ax.plot() operates on that specific Axes, not whatever happens to be current. Always use this in production code, functions, and any code with more than one subplot.

Matplotlib Object Model

Click Run to execute — Python runs in your browser via WebAssembly

Core Chart Types with Deep Examples

Line Charts — Trends Over Time

Line Charts — Trends and Confidence Intervals

Click Run to execute — Python runs in your browser via WebAssembly

Distributions: Histograms, KDE, and Violin Plots

Understanding the shape of your data is the first step in any analysis. These charts answer: Is the distribution normal? Are there outliers? Are groups different?

Distribution Shapes — Histograms and KDE

Click Run to execute — Python runs in your browser via WebAssembly

Bar Charts — Categorical Comparisons

Bar Chart Variants

Click Run to execute — Python runs in your browser via WebAssembly

Advanced Layout: GridSpec, Inset Axes, and Annotations

Advanced GridSpec Layout with Inset Axes

Click Run to execute — Python runs in your browser via WebAssembly

Colormaps, Styling, and Tick Formatting

Choosing the right colormap is not aesthetic — it's scientific. Wrong colormaps misrepresent magnitude:

Sequential:    viridis, plasma, inferno, magma — for ordered data (temperature, count)
               viridis is perceptually uniform and colorblind-safe — use it by default

Diverging:     coolwarm, RdBu, PiYG — for data with a meaningful center (correlation, z-score)
               Center the colormap at 0 with vmin=-max, vmax=max

Qualitative:   tab10, Set2, husl — for categorical data (max 8-10 categories)
               Never use rainbow/jet — not perceptually uniform, misleads magnitude

NEVER USE:     jet, rainbow — these create false visual emphasis at red/blue boundaries

Colormaps, Formatting & Style

Click Run to execute — Python runs in your browser via WebAssembly

Seaborn: Statistical Graphics

Seaborn provides plot types that directly answer statistical questions — not just "what are the values" but "how are the distributions different?" and "are these variables correlated?"

Seaborn Statistical Visualization

Click Run to execute — Python runs in your browser via WebAssembly

Real Project: EDA Dashboard

Complete EDA Dashboard

Click Run to execute — Python runs in your browser via WebAssembly

Chart Selection Guide

Question                     Best chart             Avoid
─────────────────────────────────────────────────────────────────────────
How does X change over time? Line chart             Bar chart (time ≥ 8 points)
How do categories compare?   Bar/lollipop chart     Pie (>4 categories)
What's my distribution?      Histogram + KDE        Bar chart of averages
How do 2+ groups compare?    Box/violin plot        Multiple histograms
How do two vars relate?      Scatter                Bar chart
Show part-to-whole?          Stacked bar (≤5)       Pie (>4 slices)
Show 3 variables?            Scatter(color+size)    3D plot (almost never)
High-dim feature matrix?     Heatmap                Anything else
Show raw data + summary?     Strip/swarm + box      Box alone (hides skew)
Asymmetric matrix?           Heatmap (half only)    Full matrix (redundant)

Exercises

Exercise 1 — Histogram Bin Size Effect Generate 1000 points from a bimodal distribution (mix of two normals). Plot the same data with bin counts: 5, 15, 30, 100, 500. Observe how bin choice can hide or reveal the bimodality. Add a KDE overlay using Silverman's rule on each.

Exercise 1 — Bin Size Effect

Click Run to execute — Python runs in your browser via WebAssembly

Exercise 2 — Broken Y-axis (Deliberate Misleading Chart) Plot a bar chart where the differences look huge because the y-axis starts at 99.5% instead of 0%. Then fix it to start at 0%. Observe how axis truncation creates false urgency. Write a rule about when truncating the y-axis is acceptable (spoiler: almost never for bar charts).

Exercise 3 — Custom Colormap Create a custom diverging colormap from #ef4444 (red) → white → #6366f1 (purple). Apply it to a correlation heatmap of 5 features from a random dataset. Ensure the color center is exactly at 0 using vmin/vmax.

Exercise 4 — Animated Chart Using matplotlib.animation.FuncAnimation, create an animation of a sine wave where the frequency increases from 0.5 Hz to 5 Hz over 50 frames. Save as a GIF. Note: FuncAnimation works in notebook environments; the playground saves frames instead.

Exercise 4 — Animation Frames

Click Run to execute — Python runs in your browser via WebAssembly

Exercise 5 — Bubble Chart Create a bubble chart showing countries: x = GDP per capita, y = Life expectancy, bubble size = population, color = continent. Use np.log to compress the population scale for bubble sizes. Include labels for the 5 most populous countries.

Exercise 6 — Facet Grid Given a dataset with 3 numerical features and 1 categorical (4 levels), create a 4×3 grid of histograms — one row per category, one column per feature. Use plt.subplots(4, 3, sharex="col") so x-axes align by feature. Add a column title and row title using fig.text.

Exercise 6 — Facet Grid

Click Run to execute — Python runs in your browser via WebAssembly

Exercise 7 — Publication-Quality Figure Reproduce a Tufte-style minimalist chart: sparse gridlines, no chart border (remove all 4 spines), direct labels instead of legend, a reference line with a text annotation, and high contrast. Save at 300 DPI. The chart shows A/B test conversion rates with 95% confidence intervals over 14 days.

Exercise 8 — Heatmap from Scratch Without seaborn, implement a correlation heatmap from a NumPy matrix. Annotate every cell with the value, color cells with a diverging colormap centered at 0, add row/column labels, and include a colorbar. Handle the edge case where corr[i,i] = 1.0 (diagonal) specially with a different annotation.

Exercise 8 — Correlation Heatmap from Scratch

Click Run to execute — Python runs in your browser via WebAssembly

Pandas — DataFrames, Cleaning & Analysis APIs, Web Scraping & Async HTTP