GadaaLabs
Data Analysis with Python
Lesson 7

Visualisation with Matplotlib & Seaborn

16 min

Visualisation is how you communicate analytical findings to others and how you discover things you would never see in a table. Python offers two complementary tools: Matplotlib for precise, low-level control over every element of a plot, and Seaborn for high-level statistical graphics that are beautiful by default and integrate directly with Pandas DataFrames.

The Matplotlib Object Model

Every Matplotlib plot has a hierarchy: a Figure contains one or more Axes; each Axes holds the actual plot elements (lines, bars, labels). Understanding this object model is essential for producing multi-panel figures:

python
import matplotlib.pyplot as plt
import numpy as np

# Explicit object-oriented API — always prefer this over plt.* functions
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))

x = np.linspace(0, 2 * np.pi, 200)

axes[0, 0].plot(x, np.sin(x), color="#2196F3", linewidth=2)
axes[0, 0].set_title("Sine Wave")
axes[0, 0].set_xlabel("x")
axes[0, 0].set_ylabel("sin(x)")

axes[0, 1].plot(x, np.cos(x), color="#E91E63", linestyle="--")
axes[0, 1].set_title("Cosine Wave")

fig.suptitle("Trigonometric Functions", fontsize=16, fontweight="bold")
fig.tight_layout()
plt.savefig("trig_functions.png", dpi=150, bbox_inches="tight")
plt.show()

Histograms — Distribution Shape

Histograms reveal the shape of a distribution: is it normal, skewed, bimodal, or truncated?

python
import seaborn as sns
import pandas as pd

df = sns.load_dataset("tips")

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Matplotlib histogram
axes[0].hist(df["total_bill"], bins=30, edgecolor="white", color="#42A5F5", alpha=0.8)
axes[0].set_xlabel("Total Bill ($)")
axes[0].set_ylabel("Count")
axes[0].set_title("Distribution of Total Bill (Matplotlib)")

# Seaborn histogram with KDE overlay
sns.histplot(data=df, x="total_bill", bins=30, kde=True, ax=axes[1])
axes[1].set_title("Distribution of Total Bill (Seaborn)")

fig.tight_layout()
plt.show()

Box Plots — Comparing Distributions

Box plots show median, interquartile range, whiskers (typically 1.5×IQR), and outliers. They are ideal for comparing a numeric variable across categories:

python
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Seaborn box plot
sns.boxplot(data=df, x="day", y="total_bill", palette="Set2", order=["Thur","Fri","Sat","Sun"], ax=axes[0])
axes[0].set_title("Total Bill by Day")

# Violin plot — shows full distribution shape, not just quartiles
sns.violinplot(data=df, x="day", y="total_bill", palette="Set2",
               order=["Thur","Fri","Sat","Sun"], inner="quartile", ax=axes[1])
axes[1].set_title("Total Bill by Day (Violin)")

fig.tight_layout()
plt.show()

| Chart | Best for | |---|---| | Box plot | Comparing medians and spread; outlier detection | | Violin plot | Comparing full distribution shapes | | Strip/swarm plot | Small datasets; show individual points | | Bar chart | Comparing point estimates (means/totals) | | Histogram | Single distribution; shape and spread |

Scatter Plots — Relationships Between Variables

python
fig, ax = plt.subplots(figsize=(8, 6))

# Colour-encode a third variable using hue
sns.scatterplot(
    data=df,
    x="total_bill",
    y="tip",
    hue="time",
    style="smoker",
    size="size",
    sizes=(40, 200),
    alpha=0.7,
    palette="deep",
    ax=ax
)

# Add a regression line to show the trend
sns.regplot(data=df, x="total_bill", y="tip", scatter=False,
            color="grey", line_kws={"linestyle": "--"}, ax=ax)

ax.set_title("Tip vs Total Bill")
ax.set_xlabel("Total Bill ($)")
ax.set_ylabel("Tip ($)")
plt.show()

For quick pairwise scatter plots across many numeric columns:

python
iris = sns.load_dataset("iris")
sns.pairplot(iris, hue="species", diag_kind="kde", corner=True)
plt.show()

Heatmaps — Correlation and Pivot Tables

python
# Correlation heatmap
corr = df[["total_bill", "tip", "size"]].corr()

fig, ax = plt.subplots(figsize=(6, 5))
sns.heatmap(
    corr,
    annot=True,
    fmt=".2f",
    cmap="coolwarm",
    vmin=-1, vmax=1,
    square=True,
    linewidths=0.5,
    ax=ax
)
ax.set_title("Correlation Matrix")
plt.show()

# Pivot table heatmap — e.g. average tip by day and time
pivot = df.pivot_table(values="tip", index="day", columns="time", aggfunc="mean")
sns.heatmap(pivot, annot=True, fmt=".2f", cmap="YlOrRd")
plt.title("Average Tip by Day and Time")
plt.show()

Global Styling

Consistent visual style makes analysis reports look professional:

python
# Apply a Seaborn theme globally
sns.set_theme(style="whitegrid", palette="deep", font_scale=1.2)

# Or use a Matplotlib style sheet
plt.style.use("seaborn-v0_8-whitegrid")

# Custom palette
custom_colors = ["#264653", "#2A9D8F", "#E9C46A", "#F4A261", "#E76F51"]
sns.set_palette(custom_colors)

# Increase resolution for publication
plt.rcParams.update({
    "figure.dpi":      150,
    "axes.spines.top": False,
    "axes.spines.right": False
})

Summary

  • Matplotlib's object model (FigureAxes → plot elements) gives full control; always use the explicit OO API (fig, ax = plt.subplots()) rather than plt.* convenience functions in production code.
  • Seaborn's statistical chart functions (histplot, boxplot, scatterplot, heatmap) take a data DataFrame argument and a hue parameter for automatic colour-encoding by category.
  • Histograms reveal distribution shape; box plots and violin plots compare distributions across categories; scatter plots expose bivariate relationships; heatmaps show correlation matrices and pivot tables.
  • Apply sns.set_theme() or plt.style.use() at the top of your notebook to establish a consistent visual identity across all charts.
  • Always call fig.tight_layout() before plt.show() or plt.savefig() to prevent label clipping.