Multivariate Analysis (3+ Variables)
Go beyond X and Y. Analyze interactions using Color (Hue), Size, Shape, and Facets.
π¨ Introduction: Mapping Variables to Aesthetics
We live in a 3D world, but our screens are 2D (X and Y axis). To visualize more than 2 variables, we map data to other **Aesthetics**:
- Hue (Color): Good for Categories (Male vs Female).
- Size: Good for Numbers (Population size, Confidence).
- Style (Shape): Good for Categories (Circles vs Squares).
- Facets (Rows/Cols): Splitting one big plot into multiple small plots.
π Topic 1: The Power of `hue`
The easiest way to add a 3rd dimension is **Color**. By adding `hue='ColumnName'`, Seaborn automatically colors the points or bars based on that group.
π» Example: 3 Variables (Bill, Tip, Sex)
sns.scatterplot(x='total_bill', y='tip', hue='sex', data=df)
πΌοΈ Topic 2: Faceting with `relplot` & `catplot`
Sometimes, too many colors in one plot is messy. **Faceting** creates a grid of small plots. You can split data by **Columns (`col`)** and **Rows (`row`)**.
β Why use Relplot/Catplot?
`sns.scatterplot()` is a single plot.
`sns.relplot()` is a "Figure-level" function that can create *many* scatterplots at once using `col=` and `row=`.
π» Example: 4 Variables!
sns.relplot(x='total_bill', y='tip', hue='smoker', col='time', data=df)
π₯ Topic 3: Heatmaps (3 Variables)
A **Heatmap** is perfect for visualizing a matrix. To create one, you often need to **Pivot** your data first so you have an Index (Var 1), Columns (Var 2), and Values (Var 3).
π» Example: Correlation or Pivot
# 1. Pivot the datapivot = df.pivot_table(index='day', columns='time', values='tip')# 2. Plotsns.heatmap(pivot, annot=True, cmap='YlGnBu')
π Module Summary
- Hue: Adds Color to separate groups (3rd variable).
- Size/Style: Changes dot size or shape (4th variable).
- Relplot: The master function for scatter/line plots with Faceting.
- Catplot: The master function for categorical plots (box, bar, violin) with Faceting.
- Heatmap: Visualizes intensity of values in a matrix grid.
π€ Interview Q&A
Tap on the questions below to reveal the answers.
`scatterplot` is an "Axes-level" function (draws on one specific plot). `relplot` is a "Figure-level" function that manages the entire figure and allows you to use `col` and `row` to create multiple subplots.
X-axis (1), Y-axis (2), Hue/Color (3), Size (4), and Facet Columns (5). For example: `relplot(x='bill', y='tip', hue='sex', size='size', col='day')`.
It requires a Matrix form (like a correlation matrix or a Pivot Table) where index and columns represent categories, and the cell values represent the magnitude (color intensity).