DSPython Logo DSPython

SQL Set Operations

Master UNION, INTERSECT, and EXCEPT to combine and compare multiple query results with mathematical precision.

SQL Intermediate 60 min

🧠 Introduction: Combining Query Results

**Set Operations** allow you to combine results from multiple SELECT queries into a single result set. Just like mathematical set operations, SQL provides UNION, INTERSECT, and EXCEPT to merge, find common, or differentiate between query results.

Set operations require that all queries involved must have the **same number of columns** with **compatible data types** in corresponding positions. The column names in the final result come from the first SELECT statement.

πŸ“ Basic Set Operation Syntax:

SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;

🎯 Topic 1: UNION & UNION ALL

**UNION** combines results from two or more SELECT statements and removes duplicate rows. **UNION ALL** does the same but includes all rows, including duplicates (which makes it faster as no duplicate elimination is needed).

βœ… When to Use:

β€’ Combine customer lists from different regions
β€’ Merge product catalogs from multiple suppliers
β€’ Aggregate similar data from different tables

πŸ’» Example: Combine Employee Lists

-- Get all employees from IT and Finance departments
SELECT FirstName, Department FROM Employees WHERE Department = 'IT'
UNION
SELECT FirstName, Department FROM Employees WHERE Department = 'Finance';

πŸ’» UNION vs UNION ALL

-- With UNION (removes duplicates): Returns 1 row
SELECT 'Apple' AS Fruit UNION SELECT 'Apple' AS Fruit;

-- With UNION ALL (keeps duplicates): Returns 2 rows
SELECT 'Apple' AS Fruit UNION ALL SELECT 'Apple' AS Fruit;

πŸ”„ Topic 2: INTERSECT

**INTERSECT** returns only the rows that are present in **both** result sets. It finds the common elements between two queries. Like UNION, it also removes duplicate rows from the final result.

INTERSECT is particularly useful for finding overlapping data, such as customers who bought products in multiple categories or employees working on multiple projects.

πŸ’» Example: Common Employees

-- Find employees who work in both IT and have salary > 60000
SELECT EmployeeID FROM Employees WHERE Department = 'IT'
INTERSECT
SELECT EmployeeID FROM Employees WHERE Salary > 60000;

⚑ Performance Tip:

INTERSECT can often be rewritten using INNER JOIN. However, INTERSECT is more readable when dealing with the same table or when you want to compare complete rows rather than specific columns.

πŸ—‚οΈ Topic 3: EXCEPT (or MINUS)

**EXCEPT** (called **MINUS** in some databases like Oracle) returns rows from the first query that are **not present** in the second query. It performs a set difference operation.

πŸ” Common Use Cases:

β€’ Find customers who haven't placed an order
β€’ Identify products never purchased
β€’ Discover employees not assigned to any project
β€’ Track changes between two data sets

πŸ’» Example: Unassigned Employees

-- Find departments that have no projects
SELECT DeptName FROM Departments
EXCEPT
SELECT Department FROM Projects;

πŸ’» Important Note:

-- EXCEPT is NOT commutative
(Query A EXCEPT Query B) β‰  (Query B EXCEPT Query A)
-- The order matters significantly!

πŸ”‘ Topic 4: Practical Considerations & Best Practices

Using set operations effectively requires understanding their limitations and optimization techniques.

1. Column Compatibility Rules:

β€’ Same number of columns in all SELECT statements
β€’ Corresponding columns must have compatible data types
β€’ Column names come from the first query
β€’ ORDER BY can only appear at the end of the entire statement

2. Performance Optimization:

β€’ Use UNION ALL instead of UNION when duplicates don't matter (faster)
β€’ Apply WHERE clauses in individual queries, not after UNION
β€’ Use INTERSECT/EXCEPT for clarity, but test against JOIN alternatives
β€’ Add appropriate indexes on columns used in WHERE clauses

3. Common Pitfalls:

β€’ Forgetting that UNION removes duplicates
β€’ Mismatched column counts or data types
β€’ Using ORDER BY in individual queries instead of at the end
β€’ Assuming EXCEPT is commutative (it's not!)

πŸ“Š Topic 5: Complex Set Operations

Set operations can be combined and nested to solve complex business problems. You can chain multiple operations together and use parentheses to control evaluation order.

πŸ’» Example: Complex Business Logic

-- Find employees in IT or Finance but not in HR
(
SELECT EmployeeID FROM Employees WHERE Department = 'IT'
UNION
SELECT EmployeeID FROM Employees WHERE Department = 'Finance'
)
EXCEPT
SELECT EmployeeID FROM Employees WHERE Department = 'HR';

πŸ’» Example: With ORDER BY

-- Combine and sort results
SELECT FirstName, Department FROM Employees WHERE Salary > 60000
UNION
SELECT FirstName, Department FROM Employees WHERE Department = 'IT'
ORDER BY Department, FirstName; -- Only one ORDER BY at the end!

πŸ“š Module Summary

  • UNION: Combines results, removes duplicates
  • UNION ALL: Combines results, keeps all rows (faster)
  • INTERSECT: Returns common rows from both queries
  • EXCEPT/MINUS: Returns rows from first query not in second
  • Requirements: Same column count & compatible types
  • ORDER BY: Can only appear at the end of entire statement

πŸ€” Interview Q&A

Tap on the questions below to reveal the answers.

UNION removes duplicate rows from the combined result set, while UNION ALL includes all rows, including duplicates. Use UNION ALL when you know there are no duplicates or when duplicates don't matter, as it's faster because it doesn't need to check for duplicates. Use UNION when you need distinct results and performance isn't critical.

EXCEPT compares complete rows between two result sets, while NOT IN checks if a value is not in a list. EXCEPT removes duplicates and handles NULL values differently (NULL = NULL returns UNKNOWN in EXCEPT). NOT IN can have issues with NULL values in the subquery. EXCEPT is generally more readable when comparing complete rows from different queries.

No, you cannot use ORDER BY in individual SELECT statements within a set operation (except in some databases with derived tables). The ORDER BY clause must appear only once at the very end of the entire set operation statement, and it applies to the final combined result set.

SQL will return an error. All SELECT statements in a set operation must have the same number of columns, and corresponding columns must have compatible data types. If data types are incompatible, you can use CAST or CONVERT functions to make them compatible. Column names come from the first SELECT statement.

πŸ€–
DSPython AI Assistant βœ–
πŸ‘‹ Hi! I’m your AI assistant. Paste your code here, I will find bugs for you.