DSPython Logo DSPython

SQL Aggregate Functions

Learn to summarize data using GROUP BY, SUM, AVG, and COUNT.

SQL Intermediate 45 min

🔢 Topic 1: Aggregate Functions

**Aggregate functions** perform a calculation on a set of rows and return a single summary value. They summarize data vertically across an entire column or a specific group.

📈 The 5 Core Aggregates:

  • COUNT(): Returns the number of rows/values.
  • SUM(): Returns the total sum of a numeric column.
  • AVG(): Returns the average value of a numeric column.
  • MIN(): Returns the smallest value.
  • MAX(): Returns the largest value.

COUNT Variations (Important Distinction):

  • COUNT(*): Counts **all rows**, including those with `NULL` values.
  • COUNT(ColumnName): Counts **only non-NULL** values in that specific column.
  • COUNT(DISTINCT ColumnName): Counts the number of **unique, non-NULL** values in that column.

💻 Example: COUNT vs COUNT DISTINCT

-- Total number of employees (all rows)
SELECT COUNT(*) FROM Employees;

-- Total number of unique departments
SELECT COUNT(DISTINCT Department) FROM Employees;

👥 Topic 2: The GROUP BY Clause

The **GROUP BY** clause is used to group identical data from one or more columns into summary rows. Instead of getting one total salary for the company, you can get the total salary **for each department**.

⚠️ The Golden Rule of Grouping:

Any non-aggregated column in the SELECT list (a column that is not wrapped in SUM(), AVG(), etc.) **MUST** appear in the GROUP BY clause.

💻 Example: Sum by Department

-- Get the average salary and employee count for each department
SELECT Department, AVG(Salary), COUNT(*)
FROM Employees
GROUP BY Department;

⚖️ Topic 3: HAVING vs WHERE

These two clauses both filter data, but their placement and use case are entirely different. This is the **most common topic** for SQL interviews.

1. WHERE Clause:

Filters **individual rows** *before* any grouping or aggregation takes place. **Cannot** use aggregate functions.

2. HAVING Clause:

Filters **grouped results** *after* the grouping and aggregation are complete. **Must** use aggregate functions in the condition.

💻 Example: Filtering Grouped Results

-- Get departments where the total salary is OVER 100000
SELECT Department, SUM(Salary)
FROM Employees
GROUP BY Department
HAVING SUM(Salary) > 100000;

🧠 Topic 4: SQL Logical Execution Order

The order in which you write the SQL keywords is different from the order in which the database **logically executes** the query. Understanding this flow is essential for debugging and writing efficient SQL.

[Image of SQL execution order flow chart]

📝 Execution Steps (The Database's Order):

  1. FROM: Determines the initial dataset (the source table).
  2. WHERE: Filters out individual rows (cannot use aggregates).
  3. GROUP BY: Groups the remaining rows based on the specified columns.
  4. HAVING: Filters out groups based on aggregate results.
  5. SELECT: Calculates the final expressions (aggregates, simple columns).
  6. ORDER BY: Sorts the final result set.

📚 Module Summary

  • Aggregate Functions: Summarize vertically (SUM, AVG, COUNT).
  • GROUP BY: Splits the data into groups to apply aggregates.
  • WHERE: Filters rows BEFORE grouping (no aggregates).
  • HAVING: Filters groups AFTER aggregation (must use aggregates).
  • Order: FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY.

🤔 Interview Q&A

Tap on the questions below to reveal the answers.

WHERE filters individual rows before the grouping takes place, and cannot contain aggregate functions. HAVING filters the grouped results after aggregation, and its condition must include an aggregate function (e.g., HAVING SUM(Sales) > 1000).

No. Any column that is not wrapped in an aggregate function (like SUM() or COUNT()) must be included in the GROUP BY clause. This is the fundamental rule of SQL grouping.

COUNT(*) counts all rows in the result set, including rows where columns might contain NULL values. COUNT(ColumnName) only counts rows where the specific column has a non-NULL value.

The GROUP BY clause executes first. The database must group the rows and calculate the aggregates (like SUM(Salary)) before the SELECT clause can output those calculated results.