DSPython Logo DSPython

Bivariate Analysis (Numeric vs. Numeric)

Analyze relationships, correlations, and trends between two numerical variables.

Data Visualization Intermediate 45 min

πŸ“ˆ Introduction: Trends and Relationships

Numeric vs. Numeric Analysis is about finding relationships. As one number goes up, does the other go up (Positive Correlation), go down (Negative Correlation), or do nothing (No Correlation)?

Common questions include: "Does **Study Hours** affect **Marks**?" or "Does **Speed** affect **Mileage**?". We use Scatterplots and Correlation Matrices to answer these.

[Image of a scatter plot showing positive negative and no correlation]

πŸ“ Topic 1: The Scatterplot: sns.scatterplot()

The **Scatterplot** is the fundamental chart for this analysis. It plots every single data point as a dot on an X-Y plane. It reveals the **Shape** and **Direction** of the relationship.

βœ… What to look for:

  • **Linearity:** Do the dots form a straight line?
  • **Clusters:** Are there distinct groups of dots?
  • **Outliers:** Are there dots far away from the main group?

πŸ’» Example: Total Bill vs Tip

sns.scatterplot(x='total_bill', y='tip', data=df)
plt.title("Bill Amount vs Tip Given")
plt.show()

πŸ”₯ Topic 2: Correlation Heatmap: df.corr()

While plots show the shape, **Correlation (`r`)** gives you a single number between **-1 and 1** to measure the *strength* of the relationship.

πŸ’‘ The Correlation Coefficient (r):

  • **+1.0:** Perfect Positive Relationship (Both go up).
  • **0.0:** No Relationship (Random).
  • **-1.0:** Perfect Negative Relationship (One up, one down).

πŸ’» Example: The Heatmap

corr_matrix = df.corr(numeric_only=True)
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
# annot=True writes the numbers on the boxes.

πŸ“‰ Topic 3: Trend Lines: sns.regplot()

Sometimes the dots are messy. A **Regression Plot** adds a "Line of Best Fit" through the scatterplot. This helps your eye see the general trend immediately.

πŸ’» Example: Adding a Trend Line

sns.regplot(x='total_bill', y='tip', data=df)
plt.title("Trend of Tipping")

🧩 Topic 4: The Big Picture: sns.pairplot()

If you have 5 numeric columns, drawing scatterplots for every pair takes forever. **Pairplot** does it all at once. It creates a grid of scatterplots for every possible pair of numerical columns.

**Pro Tip:** Add `hue` to a pairplot to see how categorical groups cluster across all dimensions instantly.

πŸ’» Example: Everything at Once

sns.pairplot(df, hue='Sex')
# Plots every numeric column against every other, colored by Sex.

πŸ“š Module Summary

  • Scatterplot: Shows the position of every data point (Shape of relationship).
  • Correlation (r): A number (-1 to 1) indicating strength and direction.
  • Heatmap: Visualizes the correlation matrix with colors.
  • Regplot: Draws a trend line through the scatter data.
  • Pairplot: A grid of scatterplots for the entire dataset.

πŸ€” Interview Q&A

Tap on the questions below to reveal the answers.

A correlation of 0.0 implies No Linear Relationship. However, there could still be a non-linear relationship (like a U-shape) that correlation captures as zero.

In newer versions of Pandas, `df.corr()` will crash or warn if it tries to calculate correlation on text (string) columns. `numeric_only=True` forces it to ignore non-numeric data.

A scatterplot simply plots the points. A Regplot (Regression Plot) plots the points AND calculates/draws a linear regression model fit line (and confidence interval) on top of it.

A **Jointplot** is used when you want to see the scatterplot relationship between two variables AND their individual distributions (histograms) on the top and side axes simultaneously.

πŸ€–
DSPython AI Assistant βœ–
πŸ‘‹ Hi! I’m your AI assistant. Paste your code here, I will find bugs for you.