DSPython Logo DSPython

SQL Subqueries

Master the use of inner queries to retrieve data that will be used by the outer query in WHERE, FROM, and SELECT clauses.

SQL Intermediate / Advanced 75 min

💡 Introduction: Subqueries - The Nested Question

A **Subquery** is simply a **SELECT** query nested inside another SQL statement. It allows you to ask a question where the answer depends on the results of a preliminary, inner question.

For example, you cannot easily find employees *above* the average salary in a single step, because you don't know the average yet. A subquery solves this by calculating the average first.

🔢 Topic 1: Scalar Subqueries (Single Value)

A **Scalar Subquery** is one that returns exactly **one column and one row** (a single value). It is typically used in the **WHERE** clause alongside comparison operators (`>`, `=`, `<`).

✅ Use Case:

Filtering based on a calculated aggregate value (like the minimum, maximum, or average).

💻 Example: Employees Above Average Salary

SELECT Name, Salary
FROM Employees
WHERE Salary > (SELECT AVG(Salary) FROM Employees);

📋 Topic 2: Column Subqueries (List of Values)

A **Column Subquery** returns **multiple rows but only one column** (a list of values). It must be used with set operators like **IN**, **NOT IN**, or **EXISTS**.

⚠️ Key Operators:

The result set of the inner query is treated as a list that the outer query checks against.

💻 Example: Finding Customers Who Ordered

-- Find customer details for those who have an OrderID in the Orders table
SELECT Name, City
FROM Customers
WHERE CustomerID IN (SELECT CustomerID FROM Orders);

🔄 Topic 3: Correlated Subqueries

A **Correlated Subquery** is the most complex type. The inner query **depends on the row currently being processed by the outer query**. It is executed repeatedly, once for *every single row* of the outer table.

This allows for row-by-row filtering based on dynamic criteria (e.g., finding the biggest order *within* each customer's specific order history).

💻 Example: Finding Employees Earning the Most in Their Dept

-- Find the employee who earns the max salary within THEIR specific department
SELECT Name, Salary, Department
FROM Employees E1
WHERE Salary = (
    SELECT MAX(Salary)
    FROM Employees E2
    WHERE E2.Department = E1.Department -- Correlation/Dependency
);

🌟 Topic 4: WITH Clause (CTEs)

**Common Table Expressions (CTEs)**, defined using the **WITH** clause, are named temporary result sets that exist only for the duration of a single query.

CTEs are the modern, recommended alternative to deeply nested subqueries. They drastically improve **readability** and **performance**.

💻 Example: Using a CTE (High Earners)

WITH AvgSalary AS (
    SELECT AVG(Salary) AS avg_sal FROM Employees
)
SELECT E.Name, E.Salary
FROM Employees E, AvgSalary A
WHERE E.Salary > A.avg_sal; -- Cleaner and easier to read

📚 Module Summary

  • Scalar Subquery: Returns 1 value (1 row, 1 column). Used with =, >.
  • Column Subquery: Returns a list of values. Used with IN, NOT IN.
  • Correlated Subquery: Inner query depends on Outer query (slow).
  • CTEs (WITH): Temporary, named result sets used to simplify complex, multi-stage queries.

🤔 Interview Q&A

Tap on the questions below to reveal the answers.

A standard Subquery executes only **once** before the outer query begins. A Correlated Subquery executes **once for every single row** of the outer query, making it significantly slower on large datasets.

You should use a CTE when the subquery is complex or needs to be referenced multiple times in the main query. CTEs improve readability, make debugging easier, and allow for **recursive** queries.

No. If a Scalar Subquery returns more than one row, the main query will fail with an error. Scalar subqueries must always return a single value to be used with comparison operators (=, >, <).

The Correlated Subquery is often used with EXISTS. The outer query checks if the inner query (which depends on the outer row) returns *any* row. If it returns at least one row, EXISTS evaluates to TRUE.