CTE
Common Table Expression
Common Table Expressions (CTEs) are an important feature in SQL that allows users to name and use a temporary table within the scope of a specific query. CTEs are used in processing complex queries, often combined with SELECT, INSERT, UPDATE or DELETE statements to help increase clarity, readability and management of SQL code.
WITH cte_name (column1, column2, ...) AS (
-- CTE query
SELECT ...
FROM ...
WHERE ...
)
-- Using CTE
SELECT ...
FROM cte_name
WHERE ...
Compare CTE with another technique
CTE vs Sub-queries
Declare
Determination with the WITH
Determination clause in another query, enclosed in parentheses
Readability
Easy to read and understand, easy to maintain because it is separated into small components, separate from the main query
Can make the query complex, can cause confusion for the purpose of the entire paragraph code
Reusability
Can be reused multiple times within the same query, available until the next query is executed
Single use within the query, where declared
Application
Used in complex queries, divided into smaller components, easy to read and understand, manage code
Used in case of simple queries, no need to reuse code many times
CTEs vs Temp table
Declare
Defined with the WITH clause
Defined with the symbol '#'
Readability
Is a temporary result, not stored in the database
Is a physical object, created and temporarily stored in the database
Reusability and Constraint
Can be reused multiple times within the same query, available until the next query is executed
Allows indexing, editing, and can be accessed from multiple queries in the same session
Application
Use when users care about readability and reusability, ease of maintenance, do not need to store data outside the scope of the query
Use when need to store intermediate results for many query, have a need for indexing for better performance, or need to modify data in a table to serve different query purposes
Types of CTEs in SQL
Non-Recursive CTEs
Non-recursive CTEs are used to simplify complex queries, making it easier for users to read and understand the code. They act as temporary result sets, allowing users to logically break down and order different parts of a query, which can be referenced later in the same query.
Recursive CTEs
Recursive CTEs are a powerful tool that allows processing hierarchical or repetitive data structures, such as traversing a genealogy tree or finding the shortest path in a graph. Recursive CTEs are divided into two parts: the initial, non-recursive part (called the “anchor member”), and the recursive part (called the “recursive member”), both of which refer to themselves. CTE.
For example: a recursive CTE application searches for dependencies in a company's organizational structure:
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
EmployeeName VARCHAR(255),
ManagerID INT
);
INSERT INTO Employees (EmployeeID, EmployeeName, ManagerID)
VALUES (1, 'CEO', NULL),
(2, 'VP of Sales', 1),
(3, 'Sales Manager', 2),
(4, 'Sales Representative', 3),
(5, 'VP of Engineering', 1),
(6, 'Engineering Manager', 5),
(7, 'Software Engineer', 6);
-- Create a recursive CTE to create an organization tree
WITH RecursiveOrg AS (
SELECT EmployeeID, EmployeeName, ManagerID, 0 AS Level
FROM Employees
WHERE ManagerID IS NULL
UNION ALL
SELECT E.EmployeeID, E.EmployeeName, E.ManagerID, R.Level + 1
FROM Employees AS E
JOIN RecursiveOrg AS R ON E.ManagerID = R.EmployeeID
)
-- Query organization tree
SELECT EmployeeID, EmployeeName, ManagerID, Level
FROM RecursiveOrg
ORDER BY Level, EmployeeID;
In there:
“Anchor members” include rows in the Employees table whose ManagerID is NULL. This corresponds to the CEO – which represents the starting point of the recursive process.
“Recursive member” contains all employees (including ranks) in the company organization – created by combining information in the Employees table and based on the ManagerID relationship to determine the rank for each staff.
CTE RecursiveOrg starts at the CEO and then iterates through the ranks of the organizational structure. The result is a table containing all employees with information about their rank in the organization.
Some advanced applications of CTEs in analysis
In this article, we will see an example with the following table and data information:
Information of tables:
CREATE TABLE sales_data (
order_id INT PRIMARY KEY,
order_date DATE,
customer_id VARCHAR(100),
total_amount DECIMAL(10, 2)
);
CREATE TABLE relationships (
user_id INT,
friend_id INT
);
CREATE TABLE text_data (
text_id INT PRIMARY KEY,
text_content TEXT
);
Data in the tables:
INSERT INTO sales_data (order_id, order_date, customer_id, total_amount)
VALUES
(1, '2023-01-15', 'Cust123', 100.50),
(2, '2023-01-18', 'Cust456', 75.25),
(3, '2023-02-10', 'Cust789', 150.00),
(4, '2023-02-15', 'Cust123', 200.75),
(5, '2023-03-05', 'Cust456', 50.00),
(6, '2023-03-20', 'Cust789', 125.50),
(7, '2023-04-10', 'Cust123', 90.25),
(8, '2023-04-15', 'Cust456', 300.00),
(9, '2023-05-02', 'Cust789', 75.75),
(10, '2023-05-20', 'Cust123', 150.00);
INSERT INTO relationships (user_id, friend_id)
VALUES
(1, 2),
(1, 3),
(2, 4),
(3, 5),
(4, 6),
(5, 7),
(6, 8),
(7, 9),
(8, 10),
(9, 10);
INSERT INTO text_data (text_id, text_content)
VALUES
(1, 'Common Table Expressions (CTE) is a powerful tool in SQL.'),
(2, 'CTE allows you to create a temporary table name and use it in a query.'),
(3, 'CTE can be used to build complex queries and calculations.'),
(4, 'Using CTE to analyze and transform text data is a common application.');
Clean and transform data
With CTEs, users can create complex data cleaning and transformation steps to cater to each data analysis need.
For example: Remove duplicate lines, replace null values, and calculate complex indices.
WITH CleanedSalesData AS (
-- CTE for data cleaning
SELECT
order_id,
customer_id,
TO_DATE(order_date, 'YYYY-MM-DD') AS cleaned_order_date,
CASE
WHEN total_amount < 0 THEN 0
ELSE total_amount
END AS cleaned_total_amount
FROM
sales_data
)
-- Main query uses cleaned data
SELECT
order_id,
customer_id,
cleaned_order_date,
cleaned_total_amount
FROM
CleanedSalesData
WHERE
cleaned_total_amount > 0;
Specifically, the CleanedSalesData CTE will do the following:
Use the TO_DATE function to normalize the value in the order_date column into a standard date format.
Besides, the CASE…WHEN clause ensures that the total_amount column value is never negative. If the value is less than 0, a value of 0 will be assigned.
Finally, in the main query, we will retrieve data from the CleanedSalesData CTE to retrieve rows with a cleaned_total_amount value greater than 0.
Time series analysis
Consider the sales_date table, with columns order_id, order_date, and total_amount
Problem to be solved: Use CTE to analyze total sales by month and average total monthly sales for the year.
WITH MonthlySalesData AS (
-- CTE for time series analysis
SELECT
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(MONTH FROM order_date) AS month,
SUM(total_amount) AS monthly_total
FROM
sales_data
GROUP BY
year,
month
),
AverageMonthlySales AS (
-- CTE to calculate the total average monthly sales for the year
SELECT
year,
AVG(monthly_total) AS average_monthly_sales
FROM
MonthlySalesData
GROUP BY
year
)
-- Main query uses analyzed data
SELECT
year,
month,
monthly_total,
average_monthly_sales
FROM
MonthlySalesData
JOIN
AverageMonthlySales ON MonthlySalesData.year = AverageMonthlySales.year
ORDER BY
year,
month;
Specifically:
CTE MonthlySalesData will analyze data by year and month, calculating total monthly sales.
CTE AverageMonthlySales will calculate the total average monthly sales for the year using data from MonthlySalesData.
Finally, in the main query, we will combine data from the above 2 CTEs to solve the above analysis need.
Analyze text strings
Consider the text_data table containing text segments.
Problem to solve: Use CTE to find words that appear multiple times in those paragraphs of text.
WITH WordCounts AS (
-- CTE to count word occurrences
SELECT
text_id,
word,
COUNT(*) AS count
FROM (
SELECT
text_id,
regexp_split_to_table(text_content, E'\\s+') AS word
FROM
text_data
) AS words
GROUP BY
text_id, word
)
-- The main query uses the analyzed data
SELECT
text_id,
word,
count
FROM
WordCounts
ORDER BY
text_id, count DESC;
Specifically:
CTE WordCounts has the function of counting words appearing in a paragraph of text combined with the regexp_split_to_table function to split paragraphs of text into individual words.
In the main query, we will retrieve data from the WordCounts CTE to get the result of the word occurrence count.
Last updated