Measures of Central Tendency
Understand Mean, Median, and Mode - the three pillars of data summarization and their applications in data science.
Central Tendency
Beginner Level
30 min
๐ฏ What are Measures of Central Tendency?
Definition
Measures of Central Tendency are statistical measures that identify a single value as representative of an entire dataset.
They answer: "What's the typical or central value?"
Why They Matter
- Summarize large datasets
- Compare different datasets
- Make data-driven decisions
- Understand data distribution
๐ The Three Pillars of Central Tendency
ฮผ
Mean
Average
+
M
Median
Middle Value
+
Mo
Mode
Most Frequent
Data Science Insight:
Choosing the right measure can make or break your analysis. Each has strengths for different types of data and distributions.
Choosing the right measure can make or break your analysis. Each has strengths for different types of data and distributions.
๐ต Mean (Arithmetic Average)
๐ Mean Formula
Mean = ฮฃx / n
Where: ฮฃx = Sum of all values, n = Number of values
๐ Example: Data = [5, 7, 9, 12, 15]
5
7
9
12
15
=
48
/
5
=
9.6
โ When to Use Mean
- Data is normally distributed
- No extreme outliers
- Continuous numerical data
- Need all data points considered
โ When NOT to Use Mean
- Skewed distributions
- Presence of outliers
- Categorical/Ordinal data
- Income/salary data
๐ Properties of Mean
ฮฃ
Uses all data points
๐
Affected by outliers
๐งฎ
Algebraic operations possible
โ๏ธ
Balance point of data
๐ข Median (Middle Value)
๐ Median Calculation
๐ข Step-by-Step Process:
Step 1: Sort the data
5
7
9
12
15
Step 2: Find middle position
Median position = (n + 1) / 2
For n=5: (5+1)/2 = 3rd position
Step 3: Identify median value
Position 1: 5
Position 2: 7
Position 3: 9 โ Median
Position 4: 12
Position 5: 15
๐ Even Number of Values:
Data: [3, 5, 7, 9]
Middle positions:
5
7
โ Median = (5+7)/2 = 6
โ When to Use Median
- Skewed distributions
- Income/salary data
- Presence of outliers
- Ordinal data (sometimes)
- Real estate prices
๐ก Data Science Applications
- Salary analysis (CEO salary skews mean)
- House price analysis
- Customer age analysis
- Outlier detection
๐ Median vs Outliers
Data with Outlier
[5, 7, 9, 12, 150]
[5, 7, 9, 12, 150]
Mean = 36.6 (affected)
Median = 9 (robust)
Median = 9 (robust)
Normal Data
[5, 7, 9, 12, 15]
[5, 7, 9, 12, 15]
Mean = 9.6
Median = 9
Median = 9
๐ฃ Mode (Most Frequent Value)
๐ Mode Calculation
๐ข Example 1: Single Mode
5
7
9
12
9
15
Frequency: 5(1), 7(1), 9(2), 12(1), 15(1)
Mode = 9 (appears twice)
Mode = 9 (appears twice)
๐ Example 2: Bimodal Distribution
5
7
5
9
9
12
Frequency: 5(2), 7(1), 9(2), 12(1)
Modes = 5 and 9 (both appear twice)
Modes = 5 and 9 (both appear twice)
๐ Example 3: No Mode
5
7
9
12
15
All values appear once
No Mode (or each value is a mode)
No Mode (or each value is a mode)
โ When to Use Mode
- Categorical data (Nominal)
- Customer preferences
- Survey responses
- Finding most common category
- Data with repeated values
๐ก Real-world Applications
- Most purchased product
- Most common customer rating
- Popular colors in fashion
- Frequent error types
๐ฏ Mode for Different Data Types
๐ท๏ธ
Nominal Data
Only mode works
Only mode works
๐
Ordinal Data
Mode + Median
Mode + Median
๐ข
Numerical Data
All three work
All three work
๐ฅ
Multiple Modes
Bimodal/Multimodal
Bimodal/Multimodal
โ๏ธ Comparison: Mean vs Median vs Mode
๐ Quick Comparison Table
| Aspect | Mean | Median | Mode |
|---|---|---|---|
| Definition | Average value | Middle value | Most frequent value |
| Outliers Effect | Highly affected | Not affected | Not affected |
| Data Types | Quantitative only | Quantitative + Ordinal | All data types |
| Best For | Normal distributions | Skewed distributions | Categorical data |
| Formula | ฮฃx / n | Middle value(s) | Highest frequency |
๐ฏ Decision Guide: When to Use Which?
๐
Use MEAN when:
- Data is symmetric
- No outliers
- Need all data considered
- Further calculations needed
๐ฐ
Use MEDIAN when:
- Data is skewed
- Outliers present
- Salary/income data
- Ordinal data
๐
Use MODE when:
- Categorical data
- Finding most common
- Survey responses
- Nominal scales
๐ Mean vs Median in Skewed Data
Left-Skewed (Negative Skew)
Mean
Median
Mode
Order: Mean < Median < Mode
Right-Skewed (Positive Skew)
Mode
Median
Mean
Order: Mode < Median < Mean
๐ก Remember: In skewed distributions, mean gets pulled toward the tail, median stays in the middle.
๐ง Data Science Applications
๐ฐ
Salary Analysis
Problem: Company salary data with CEO earning $5M and employees earning $50k-$150k
Wrong Approach: Using Mean
Mean = $180k (misleading)
Correct Approach: Using Median
Median = $85k (representative)
Why: CEO salary is an outlier that skews mean
โญ
Customer Ratings
Problem: Product ratings on scale 1-5
Ratings: [5, 5, 4, 5, 3, 5, 4, 5, 2, 5]
Mean: 4.3
Median: 5
Mode: 5
Best Choice: Mode (5) shows most common customer experience
๐
E-commerce Analysis
Scenario: Analyzing customer purchase amounts
Different Measures for Different Insights:
- Mean: Average revenue per customer
- Median: Typical customer spend
- Mode: Most common purchase amount
Insight: Use all three for comprehensive understanding
๐ฏ Interview Questions Preview
Q: When would you use median instead of mean in salary analysis?
A: When data is skewed (e.g., CEO salary much higher than employees)
Q: What does it mean if mean > median > mode?
A: Right-skewed distribution (positive skew)
โ Chapter Summary & Cheatsheet
๐
Mean (Average)
Use for normal distributions, sensitive to outliers
ฮผ = ฮฃx / n
๐ฏ
Median (Middle)
Use for skewed data, robust to outliers
Sort โ Find middle value
๐
Mode (Most Frequent)
Use for categorical data, finds most common
Highest frequency value
โก Quick Decision Guide
Normal data โ Mean
Skewed data โ Median
Categorical โ Mode
Outliers present โ Median
Salary data โ Median
Ratings โ Mode
๐ Key Formulas
Mean
ฮผ = ฮฃx / n
Median
Middle(sorted_data)
Mode
max(frequency)