Statistics for Data Science – Normal Distribution
Understand the MOST IMPORTANT distribution in data science: Bell curve properties, empirical rule, Z-scores, and applications in feature scaling, outlier detection, and confidence intervals.
🔥 The MOST IMPORTANT Distribution in Data Science
🎯 Why It's So Important
The Normal Distribution (Gaussian Distribution) is the foundation of statistical inference and machine learning. It appears everywhere in nature, business, and science due to the Central Limit Theorem.
Universal
Appears in almost all natural phenomena
Mathematical Simplicity
Easy to work with analytically
Central Limit Theorem
Sample means become normal
ML Foundation
Basis for many algorithms
📊 Where You'll Find Normal Distributions
Most people near average height
Most scores around class average
Small errors more common than large ones
Daily returns cluster around mean
📚 8 Core Concepts
Bell Curve
Characteristic shape
Mean = Median = Mode
Central tendency equality
Symmetry
Perfect mirror image
Empirical Rule
68–95–99.7 rule
Z-score
Standardization
Standard Normal
μ=0, σ=1
Skewness Intro
Asymmetry measure
Real Data Examples
Practical applications
📐 3 Key Formulas
📌 Example
Class average = 60, σ = 10
Student score = 80
Z = (80 − 60) / 10 = +2
👉 Student scored 2 standard deviations above average
📊 Bell Curve & Properties
The Bell Curve
Also known as Gaussian curve or Normal curve
📌 Example: Exam Marks
Suppose marks of 100 students form a normal distribution.
- Most students score around 70 marks
- Very few students score below 40 or above 95
👉 This creates a bell-shaped curve with peak at average marks.
Mean = Median = Mode
In a perfectly normal distribution, all three measures of central tendency are identical.
Perfect Symmetry
The left half is a mirror image of the right half around the mean.
📌 Example
Heights (cm): 165, 168, 170, 170, 172
- Mean = (165+168+170+170+172)/5 = 169
- Median = 170
- Mode = 170
👉 In near-normal data, these values almost match.
📏 Effect of Standard Deviation
Tall & skinny
Typical bell
Short & wide
🎯 Empirical Rule (68–95–99.7)
The Golden Rule
For any normal distribution, data falls within these predictable ranges
μ ± σ
μ ± 2σ
μ ± 3σ
- 68% of students scored between 65 and 85
- 95% of students scored between 55 and 95
- 99.7% of students scored between 45 and 105
📐 Z-Scores & Standard Normal Distribution
Z-Score Formula
Measures how many standard deviations a value is from the mean
📌 Example
If original marks are converted to Z-scores:
- Mean becomes 0
- Standard deviation becomes 1
👉 Now we can directly use Z-tables.
📊 Z-Score Interpretation
Value is exactly at the mean
Value is 1.5σ above the mean
Value is 2σ below the mean
Potential outlier (rare)
🎯 Standard Normal Distribution
The Standard Normal Distribution is a special normal distribution with:
Key Benefit: Any normal distribution can be converted to standard normal using Z-scores. This allows us to use standard normal tables (Z-tables).
🧮 Z-Score Calculation Example
📉 Skewness Introduction
What is Skewness?
Measure of asymmetry in a distribution
Perfectly Normal: Skewness = 0 (symmetric)
Positive Skew: Right tail longer (mean > median)
Negative Skew: Left tail longer (mean < median)
📌 Example
If average salary = ₹40,000
- 50% employees earn below ₹40,000
- 50% employees earn above ₹40,000
👉 Distribution is perfectly balanced around the mean.
Positive Skew (+)
Zero Skew
Negative Skew (−)
Platykurtic
Mesokurtic
Leptokurtic
📌 Example: Income
Most people earn around ₹30,000
Few people earn ₹5,00,000+
👉 Right tail becomes longer → Positive Skew
📌 Example
- Platykurtic: Exam paper very easy → marks spread out
- Mesokurtic: Normal paper → typical bell curve
- Leptokurtic: Very tough paper → marks concentrated near mean
🏢 Real-World Skewed Distributions
Positive skew (few very high incomes)
Positive skew (few expensive houses)
Negative skew (few very young deaths)
Often normal or slightly negative skew
🧠 Data Science Applications
Feature Scaling
Standardizing features to μ=0, σ=1 using Z-scores.
Outlier Detection
Using Z-scores to identify unusual values (|Z| > 3).
Confidence Intervals
Constructing intervals using normal distribution properties.
🤖 Machine Learning Algorithms Using Normal Distribution
Assumes normally distributed errors
Assumes features follow normal distribution
Use multivariate normal distributions
Based on deviation from normal patterns
🏢 Real-World Example: Feature Scaling for ML
After Standardization: Both features have μ=0, σ=1. This prevents income from dominating age in distance-based algorithms like K-means or SVM.
⚠️ Outlier Detection Using Z-scores
Scenario: Credit card transaction amounts normally distributed with μ=$50, σ=$15
Rule: Typically flag transactions with |Z| > 3 as potential outliers (beyond 99.7% of normal transactions).
📐 Key Formulas
Z-Score Formula
Where:
x = individual value
μ = population mean
σ = population standard deviation
Empirical Rule
P(μ − 2σ ≤ X ≤ μ + 2σ) ≈ 0.95
P(μ − 3σ ≤ X ≤ μ + 3σ) ≈ 0.997
For any normal distribution:
68% within ±1σ, 95% within ±2σ, 99.7% within ±3σ
Standard Normal
Z = (X − μ)/σ ~ N(0, 1)
Any normal distribution X can be standardized to Z
Z follows standard normal distribution
μ=0, σ=1
💡 Pro Tip: Memorize these critical Z-values: Z=1.96 gives 95% confidence, Z=2.576 gives 99% confidence. These are used constantly in hypothesis testing.
📌 Example: IQ Scores
IQ scores are normally distributed with:
- Mean (μ) = 100
- Standard Deviation (σ) = 15
- 68% people have IQ between 85 and 115
- 95% people have IQ between 70 and 130
- 99.7% people have IQ between 55 and 145
👉 Values outside this range are extremely rare.
✅ Chapter Summary
Core Purpose
MOST IMPORTANT distribution in data science.
8 Key Concepts
Bell curve, mean=median=mode, symmetry, empirical rule, Z-score, standard normal, skewness, real examples.
3 Key Formulas
Z = (x − μ) / σ plus empirical rule values.
Data Science Use
Feature scaling, outlier detection, confidence intervals.