ANOVA
Analysis of Variance — Comparing more than two groups at once 📊⚖️
1️⃣ What is ANOVA?
ANOVA (Analysis of Variance) is a statistical test used to compare the means of three or more groups to see if at least one of them is significantly different from the others.
🤔 Why not just use multiple t-tests?
If you compare Group A vs B, B vs C, and A vs C separately, your risk of making a Type I Error (False Positive) increases drastically. ANOVA does it all in one go with a single alpha level (e.g., 0.05).
2️⃣ The Logic: F-Statistic
ANOVA compares two types of variance. If the "Between" variance is much larger than the "Within" variance, the groups are likely different.
Variance Between Groups
How different are the group means from each other?
Variance Within Groups
How spread out is the data inside each group (noise)?
Large F-value → High probability that groups are different.
3️⃣ Visualizing Variance
Low F-score (Groups Overlap)
Means are close, noise is high. Fail to Reject H₀.
High F-score (Distinct Groups)
Means are far apart. Reject H₀ (Significant).
4️⃣ Case Study: Which Diet Works?
Comparing weight loss (kg) across 3 different diets.
The Data (Weight Loss in kg)
Grand Mean (Overall) = 11
Maths Breakdown 🧮
1. Variance Between (SSB)
How far are group means (5, 11, 17) from Grand Mean (11)?
Sum = 3*((5-11)² + (11-11)² + (17-11)²) = 3*(36+0+36) = 216
2. Variance Within (SSW)
Variation inside groups (e.g., 4,5,6 -> variance is small).
Sum of squared diffs inside groups = 2 + 2 + 2 = 6
3. F-Statistic
MS_Between = 216 / 2 = 108
MS_Within = 6 / 6 = 1
F = 108 / 1 = 108.0
import scipy.stats as stats # 1. The Data diet_a = [4, 5, 6] diet_b = [10, 11, 12] diet_c = [16, 17, 18] # 2. Perform One-Way ANOVA f_stat, p_val = stats.f_oneway(diet_a, diet_b, diet_c) print(f"F-statistic: {f_stat:.2f}") print(f"P-value: {p_val:.10f}")
> P-value: 0.0000001
Conclusion: p < 0.05. We reject H₀. There is a significant difference between the diets.
🔎 What happens after you reject H₀?
ANOVA only tells you that there is a difference, not where it is (e.g., is A different from B, or B from C?).
To find out, we use Post-Hoc Tests like Tukey's HSD (Honest Significant Difference).
Interview Checkpoint 🎯
1. When should I use ANOVA instead of t-test? ▼
2. What are the assumptions of ANOVA? ▼
2. Homogeneity of Variance: Variances in groups are roughly equal.
3. Independence: Observations are independent of each other.