DSPython Logo DSPython

Multivariate Analysis (3+ Variables)

Go beyond X and Y. Analyze interactions using Color (Hue), Size, Shape, and Facets.

Data Visualization Advanced 60 min

🎨 Introduction: Mapping Variables to Aesthetics

We live in a 3D world, but our screens are 2D (X and Y axis). To visualize more than 2 variables, we map data to other **Aesthetics**:

  • Hue (Color): Good for Categories (Male vs Female).
  • Size: Good for Numbers (Population size, Confidence).
  • Style (Shape): Good for Categories (Circles vs Squares).
  • Facets (Rows/Cols): Splitting one big plot into multiple small plots.

🌈 Topic 1: The Power of `hue`

The easiest way to add a 3rd dimension is **Color**. By adding `hue='ColumnName'`, Seaborn automatically colors the points or bars based on that group.

πŸ’» Example: 3 Variables (Bill, Tip, Sex)

sns.scatterplot(x='total_bill', y='tip', hue='sex', data=df)
# X=Bill, Y=Tip, Color=Sex

πŸ–ΌοΈ Topic 2: Faceting with `relplot` & `catplot`

Sometimes, too many colors in one plot is messy. **Faceting** creates a grid of small plots. You can split data by **Columns (`col`)** and **Rows (`row`)**.

βœ… Why use Relplot/Catplot?

`sns.scatterplot()` is a single plot.
`sns.relplot()` is a "Figure-level" function that can create *many* scatterplots at once using `col=` and `row=`.

πŸ’» Example: 4 Variables!

sns.relplot(x='total_bill', y='tip', hue='smoker', col='time', data=df)
# Plots Bill vs Tip. Colors by Smoker. Creates 2 separate plots: one for Lunch, one for Dinner.

πŸ”₯ Topic 3: Heatmaps (3 Variables)

A **Heatmap** is perfect for visualizing a matrix. To create one, you often need to **Pivot** your data first so you have an Index (Var 1), Columns (Var 2), and Values (Var 3).

πŸ’» Example: Correlation or Pivot

# 1. Pivot the data
pivot = df.pivot_table(index='day', columns='time', values='tip')

# 2. Plot
sns.heatmap(pivot, annot=True, cmap='YlGnBu')

πŸ“š Module Summary

  • Hue: Adds Color to separate groups (3rd variable).
  • Size/Style: Changes dot size or shape (4th variable).
  • Relplot: The master function for scatter/line plots with Faceting.
  • Catplot: The master function for categorical plots (box, bar, violin) with Faceting.
  • Heatmap: Visualizes intensity of values in a matrix grid.

πŸ€” Interview Q&A

Tap on the questions below to reveal the answers.

`scatterplot` is an "Axes-level" function (draws on one specific plot). `relplot` is a "Figure-level" function that manages the entire figure and allows you to use `col` and `row` to create multiple subplots.

X-axis (1), Y-axis (2), Hue/Color (3), Size (4), and Facet Columns (5). For example: `relplot(x='bill', y='tip', hue='sex', size='size', col='day')`.

It requires a Matrix form (like a correlation matrix or a Pivot Table) where index and columns represent categories, and the cell values represent the magnitude (color intensity).

πŸ€–
DSPython AI Assistant βœ–
πŸ‘‹ Hi! I’m your AI assistant. Paste your code here, I will find bugs for you.