DSPython Logo DSPython

Measures of Central Tendency

Understand Mean, Median, and Mode - the three pillars of data summarization and their applications in data science.

Central Tendency Beginner Level 30 min

๐ŸŽฏ What are Measures of Central Tendency?

Definition

Measures of Central Tendency are statistical measures that identify a single value as representative of an entire dataset.

They answer: "What's the typical or central value?"

Why They Matter

  • Summarize large datasets
  • Compare different datasets
  • Make data-driven decisions
  • Understand data distribution

๐Ÿ“Š The Three Pillars of Central Tendency

ฮผ

Mean

Average

+
M

Median

Middle Value

+
Mo

Mode

Most Frequent

Data Science Insight:
Choosing the right measure can make or break your analysis. Each has strengths for different types of data and distributions.

๐Ÿ”ต Mean (Arithmetic Average)

๐Ÿ“ Mean Formula

Mean = ฮฃx / n
Where: ฮฃx = Sum of all values, n = Number of values

๐Ÿ“Š Example: Data = [5, 7, 9, 12, 15]

5
7
9
12
15
=
48
/
5
=
9.6

โœ… When to Use Mean

  • Data is normally distributed
  • No extreme outliers
  • Continuous numerical data
  • Need all data points considered

โŒ When NOT to Use Mean

  • Skewed distributions
  • Presence of outliers
  • Categorical/Ordinal data
  • Income/salary data

๐Ÿ” Properties of Mean

ฮฃ
Uses all data points
๐Ÿ“ˆ
Affected by outliers
๐Ÿงฎ
Algebraic operations possible
โš–๏ธ
Balance point of data

๐ŸŸข Median (Middle Value)

๐Ÿ“ Median Calculation

๐Ÿ”ข Step-by-Step Process:

Step 1: Sort the data
5
7
9
12
15
Step 2: Find middle position
Median position = (n + 1) / 2
For n=5: (5+1)/2 = 3rd position
Step 3: Identify median value
Position 1: 5
Position 2: 7
Position 3: 9 โ† Median
Position 4: 12
Position 5: 15

๐Ÿ“‹ Even Number of Values:

Data: [3, 5, 7, 9]
Middle positions:
5
7
โ†’ Median = (5+7)/2 = 6

โœ… When to Use Median

  • Skewed distributions
  • Income/salary data
  • Presence of outliers
  • Ordinal data (sometimes)
  • Real estate prices

๐Ÿ’ก Data Science Applications

  • Salary analysis (CEO salary skews mean)
  • House price analysis
  • Customer age analysis
  • Outlier detection

๐Ÿ” Median vs Outliers

Data with Outlier
[5, 7, 9, 12, 150]
Mean = 36.6 (affected)
Median = 9 (robust)
Normal Data
[5, 7, 9, 12, 15]
Mean = 9.6
Median = 9

๐ŸŸฃ Mode (Most Frequent Value)

๐Ÿ“Š Mode Calculation

๐Ÿ”ข Example 1: Single Mode

5
7
9
12
9
15
Frequency: 5(1), 7(1), 9(2), 12(1), 15(1)
Mode = 9 (appears twice)

๐Ÿ“ˆ Example 2: Bimodal Distribution

5
7
5
9
9
12
Frequency: 5(2), 7(1), 9(2), 12(1)
Modes = 5 and 9 (both appear twice)

๐Ÿ” Example 3: No Mode

5
7
9
12
15
All values appear once
No Mode (or each value is a mode)

โœ… When to Use Mode

  • Categorical data (Nominal)
  • Customer preferences
  • Survey responses
  • Finding most common category
  • Data with repeated values

๐Ÿ’ก Real-world Applications

  • Most purchased product
  • Most common customer rating
  • Popular colors in fashion
  • Frequent error types

๐ŸŽฏ Mode for Different Data Types

๐Ÿท๏ธ
Nominal Data
Only mode works
๐Ÿ“Š
Ordinal Data
Mode + Median
๐Ÿ”ข
Numerical Data
All three work
๐Ÿ‘ฅ
Multiple Modes
Bimodal/Multimodal

โš–๏ธ Comparison: Mean vs Median vs Mode

๐Ÿ“‹ Quick Comparison Table

Aspect Mean Median Mode
Definition Average value Middle value Most frequent value
Outliers Effect Highly affected Not affected Not affected
Data Types Quantitative only Quantitative + Ordinal All data types
Best For Normal distributions Skewed distributions Categorical data
Formula ฮฃx / n Middle value(s) Highest frequency

๐ŸŽฏ Decision Guide: When to Use Which?

๐Ÿ“ˆ

Use MEAN when:

  • Data is symmetric
  • No outliers
  • Need all data considered
  • Further calculations needed
๐Ÿ’ฐ

Use MEDIAN when:

  • Data is skewed
  • Outliers present
  • Salary/income data
  • Ordinal data
๐Ÿ†

Use MODE when:

  • Categorical data
  • Finding most common
  • Survey responses
  • Nominal scales

๐Ÿ“Š Mean vs Median in Skewed Data

Left-Skewed (Negative Skew)

Mean
Median
Mode
Order: Mean < Median < Mode

Right-Skewed (Positive Skew)

Mode
Median
Mean
Order: Mode < Median < Mean
๐Ÿ’ก Remember: In skewed distributions, mean gets pulled toward the tail, median stays in the middle.

๐Ÿง  Data Science Applications

๐Ÿ’ฐ

Salary Analysis

Problem: Company salary data with CEO earning $5M and employees earning $50k-$150k
Wrong Approach: Using Mean
Mean = $180k (misleading)
Correct Approach: Using Median
Median = $85k (representative)
Why: CEO salary is an outlier that skews mean
โญ

Customer Ratings

Problem: Product ratings on scale 1-5
Ratings: [5, 5, 4, 5, 3, 5, 4, 5, 2, 5]
Mean: 4.3
Median: 5
Mode: 5
Best Choice: Mode (5) shows most common customer experience
๐Ÿ›’

E-commerce Analysis

Scenario: Analyzing customer purchase amounts
Different Measures for Different Insights:
  • Mean: Average revenue per customer
  • Median: Typical customer spend
  • Mode: Most common purchase amount
Insight: Use all three for comprehensive understanding

๐ŸŽฏ Interview Questions Preview

Q: When would you use median instead of mean in salary analysis?
A: When data is skewed (e.g., CEO salary much higher than employees)
Q: What does it mean if mean > median > mode?
A: Right-skewed distribution (positive skew)

โœ… Chapter Summary & Cheatsheet

๐Ÿ“Š

Mean (Average)

Use for normal distributions, sensitive to outliers

ฮผ = ฮฃx / n
๐ŸŽฏ

Median (Middle)

Use for skewed data, robust to outliers

Sort โ†’ Find middle value
๐Ÿ†

Mode (Most Frequent)

Use for categorical data, finds most common

Highest frequency value

โšก Quick Decision Guide

Normal data โ†’ Mean Skewed data โ†’ Median Categorical โ†’ Mode Outliers present โ†’ Median Salary data โ†’ Median Ratings โ†’ Mode

๐Ÿ“ Key Formulas

Mean
ฮผ = ฮฃx / n
Median
Middle(sorted_data)
Mode
max(frequency)
๐Ÿค–
DSPython AI Assistant โœ–
๐Ÿ‘‹ Hi! Iโ€™m your AI assistant. Paste your code here, I will find bugs for you.