DSPython Logo DSPython

SQL Window Functions

Calculate aggregates (like ranking, running totals, and moving averages) across a set of table rows related to the current row, without collapsing them like GROUP BY.

SQL Advanced 75 min

🖼️ Topic 1: Introduction to Window Functions

A **Window Function** performs a calculation across a set of related rows (the "window") without collapsing them. This is the key difference between **Window Functions** and standard **GROUP BY** aggregates.

Window functions allow you to calculate the average salary for a department and display that average **next to every employee in that department**, keeping the employee's detail row intact.

📝 Window Function Syntax:

FUNCTION_NAME() OVER ([PARTITION BY col] [ORDER BY col])

The OVER Clause Components:

  • PARTITION BY: Divides the rows into independent **groups** (e.g., grouping by Department). The function restarts its calculation for every new partition.
  • ORDER BY: Specifies the **order of rows** *within* the partition. This is mandatory for ranking and sequential functions.

GROUP BY (Detail Lost)

Fewer Rows Returned (Detail Lost!)

Window Function (Detail Preserved)

All Rows Returned (Detail Preserved!)

🏅 Topic 2: Ranking Functions

Ranking functions assign a numerical rank to each row within a partition based on the ORDER BY clause.

Key Ranking Functions:

  • ROW_NUMBER(): Assigns a **unique sequential integer** (1, 2, 3...). It **never skips** numbers, even in ties.
  • RANK(): Assigns the same rank to ties but **skips** the next number(s). (e.g., 1, 2, 2, **4**).
  • DENSE_RANK(): Assigns the same rank to ties but **does not skip** numbers. (e.g., 1, 2, 2, **3**).

💻 Example: Ranking Salaries within Each Department

SELECT
    FirstName, Department, Salary,
    DENSE_RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS DeptRank
FROM Employees;

↔️ Topic 3: Positional Functions (LAG and LEAD)

These functions access data from a row **relative** to the current row, making them invaluable for **time-series analysis** (calculating growth, change, or sequential difference).

Positional Functions:

  • LAG(col, offset, default): Retrieves the column value from the row **before** the current row (e.g., value from the previous month).
  • LEAD(col, offset, default): Retrieves the column value from the row **after** the current row (e.g., value for the next quarter).
  • FIRST_VALUE(col): Returns the value of the **first row** in the window (e.g., the earliest sale date).

💻 Example: Calculating Sales Growth Month-over-Month

SELECT
    Month, Sales,
    LAG(Sales, 1, 0) OVER (ORDER BY Month) AS PreviousSales
FROM Monthly_Sales;

🧮 Topic 4: Aggregate Functions as Windows

Any standard aggregate function (SUM, AVG, MIN, MAX, COUNT) can be converted into a window function by adding the OVER clause.

This allows us to calculate **Running Totals** and **Moving Averages**—metrics that depend on the rows preceding the current row.

💻 Example: Calculating a Running Total

-- Calculate the cumulative sum of sales, ordered by Date
SELECT
    Date, SaleAmount,
    SUM(SaleAmount) OVER (ORDER BY SaleDate) AS RunningTotal
FROM Daily_Sales;

📏 Topic 5: Window Framing & NTILE()

The **Window Frame** defines the specific set of rows relative to the current row that the function should operate on. By default, it runs from the start of the partition to the current row.

Window Frame Syntax:

... OVER (ORDER BY Col ROWS BETWEEN N PRECEDING AND CURRENT ROW)

NTILE(N):

**NTILE(N)** divides the ordered result set into a specified number of **equal groups (N)** and assigns a group number to each row. This is highly useful for classifying data into quartiles (N=4), deciles (N=10), or any custom percentile groups.

📚 Module Summary

  • OVER: Defines the window, prevents detail rows from collapsing.
  • PARTITION BY: Creates independent groups for ranking/aggregation.
  • LAG/LEAD: Accesses data from previous/next rows (sequential analysis).
  • DENSE_RANK: Ranking function that does not skip numbers in case of ties.
  • Running Total: Achieved by using SUM() OVER (ORDER BY Col).

🤔 Interview Q&A

Tap on the questions below to reveal the answers.

GROUP BY collapses the detail rows into a single output row, losing individual record information. PARTITION BY divides the result set into groups but keeps all detail rows intact, adding the aggregate result to each row.

RANK() assigns the same rank to ties but **skips** the next number (e.g., 1, 2, 2, 4). DENSE_RANK() assigns the same rank to ties but **does not skip** the next number (e.g., 1, 2, 2, 3).

You use the LAG() function. You calculate the current month's sales minus LAG(Sales, 1) (the previous month's sales) over the entire transaction window, ordered by date.

It means the sum starts at the **very first row** of the partition and ends at the **current row**. This is the mechanism used to calculate a **Running Total** or **Cumulative Sum**.