SQL Window Functions
Calculate aggregates (like ranking, running totals, and moving averages) across a set of table rows related to the current row, without collapsing them like GROUP BY.
🖼️ Topic 1: Introduction to Window Functions
A **Window Function** performs a calculation across a set of related rows (the "window") without collapsing them. This is the key difference between **Window Functions** and standard **GROUP BY** aggregates.
Window functions allow you to calculate the average salary for a department and display that average **next to every employee in that department**, keeping the employee's detail row intact.
📝 Window Function Syntax:
FUNCTION_NAME() OVER ([PARTITION BY col] [ORDER BY col])
The OVER Clause Components:
PARTITION BY: Divides the rows into independent **groups** (e.g., grouping by Department). The function restarts its calculation for every new partition.ORDER BY: Specifies the **order of rows** *within* the partition. This is mandatory for ranking and sequential functions.
GROUP BY (Detail Lost)
Fewer Rows Returned (Detail Lost!)
Window Function (Detail Preserved)
All Rows Returned (Detail Preserved!)
🏅 Topic 2: Ranking Functions
Ranking functions assign a numerical rank to each row within a partition based on the ORDER BY clause.
Key Ranking Functions:
ROW_NUMBER(): Assigns a **unique sequential integer** (1, 2, 3...). It **never skips** numbers, even in ties.RANK(): Assigns the same rank to ties but **skips** the next number(s). (e.g., 1, 2, 2, **4**).DENSE_RANK(): Assigns the same rank to ties but **does not skip** numbers. (e.g., 1, 2, 2, **3**).
💻 Example: Ranking Salaries within Each Department
SELECT FirstName, Department, Salary, DENSE_RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS DeptRankFROM Employees;
↔️ Topic 3: Positional Functions (LAG and LEAD)
These functions access data from a row **relative** to the current row, making them invaluable for **time-series analysis** (calculating growth, change, or sequential difference).
Positional Functions:
LAG(col, offset, default): Retrieves the column value from the row **before** the current row (e.g., value from the previous month).LEAD(col, offset, default): Retrieves the column value from the row **after** the current row (e.g., value for the next quarter).FIRST_VALUE(col): Returns the value of the **first row** in the window (e.g., the earliest sale date).
💻 Example: Calculating Sales Growth Month-over-Month
SELECT Month, Sales, LAG(Sales, 1, 0) OVER (ORDER BY Month) AS PreviousSalesFROM Monthly_Sales;
🧮 Topic 4: Aggregate Functions as Windows
Any standard aggregate function (SUM, AVG, MIN, MAX, COUNT) can be converted into a window function by adding the OVER clause.
This allows us to calculate **Running Totals** and **Moving Averages**—metrics that depend on the rows preceding the current row.
💻 Example: Calculating a Running Total
-- Calculate the cumulative sum of sales, ordered by DateSELECT Date, SaleAmount, SUM(SaleAmount) OVER (ORDER BY SaleDate) AS RunningTotalFROM Daily_Sales;
📏 Topic 5: Window Framing & NTILE()
The **Window Frame** defines the specific set of rows relative to the current row that the function should operate on. By default, it runs from the start of the partition to the current row.
Window Frame Syntax:
... OVER (ORDER BY Col ROWS BETWEEN N PRECEDING AND CURRENT ROW)
NTILE(N):
**NTILE(N)** divides the ordered result set into a specified number of **equal groups (N)** and assigns a group number to each row. This is highly useful for classifying data into quartiles (N=4), deciles (N=10), or any custom percentile groups.
📚 Module Summary
- OVER: Defines the window, prevents detail rows from collapsing.
- PARTITION BY: Creates independent groups for ranking/aggregation.
- LAG/LEAD: Accesses data from previous/next rows (sequential analysis).
- DENSE_RANK: Ranking function that does not skip numbers in case of ties.
- Running Total: Achieved by using
SUM() OVER (ORDER BY Col).
🤔 Interview Q&A
Tap on the questions below to reveal the answers.