CTE

Common Table Expression

Common Table Expressions (CTEs) are an important feature in SQL that allows users to name and use a temporary table within the scope of a specific query. CTEs are used in processing complex queries, often combined with SELECT, INSERT, UPDATE or DELETE statements to help increase clarity, readability and management of SQL code.

WITH cte_name (column1, column2, ...) AS (
    -- CTE query
    SELECT ...
    FROM ...
    WHERE ...
)

-- Using CTE 
SELECT ...
FROM cte_name
WHERE ...

Compare CTE with another technique

CTE vs Sub-queries

CTEsSub-queries

Declare

Determination with the WITH

Determination clause in another query, enclosed in parentheses

Readability

Easy to read and understand, easy to maintain because it is separated into small components, separate from the main query

Can make the query complex, can cause confusion for the purpose of the entire paragraph code

Reusability

Can be reused multiple times within the same query, available until the next query is executed

Single use within the query, where declared

Application

Used in complex queries, divided into smaller components, easy to read and understand, manage code

Used in case of simple queries, no need to reuse code many times

CTEs vs Temp table

CTEsTemp Table

Declare

Defined with the WITH clause

Defined with the symbol '#'

Readability

Is a temporary result, not stored in the database

Is a physical object, created and temporarily stored in the database

Reusability and Constraint

Can be reused multiple times within the same query, available until the next query is executed

Allows indexing, editing, and can be accessed from multiple queries in the same session

Application

Use when users care about readability and reusability, ease of maintenance, do not need to store data outside the scope of the query

Use when need to store intermediate results for many query, have a need for indexing for better performance, or need to modify data in a table to serve different query purposes

Types of CTEs in SQL

Non-Recursive CTEs

Non-recursive CTEs are used to simplify complex queries, making it easier for users to read and understand the code. They act as temporary result sets, allowing users to logically break down and order different parts of a query, which can be referenced later in the same query.

Recursive CTEs

Recursive CTEs are a powerful tool that allows processing hierarchical or repetitive data structures, such as traversing a genealogy tree or finding the shortest path in a graph. Recursive CTEs are divided into two parts: the initial, non-recursive part (called the “anchor member”), and the recursive part (called the “recursive member”), both of which refer to themselves. CTE.

For example: a recursive CTE application searches for dependencies in a company's organizational structure:

CREATE TABLE Employees (
  EmployeeID INT PRIMARY KEY,
  EmployeeName VARCHAR(255),
  ManagerID INT
);
 
INSERT INTO Employees (EmployeeID, EmployeeName, ManagerID)
VALUES (1, 'CEO', NULL),
       (2, 'VP of Sales', 1),
       (3, 'Sales Manager', 2),
       (4, 'Sales Representative', 3),
       (5, 'VP of Engineering', 1),
       (6, 'Engineering Manager', 5),
       (7, 'Software Engineer', 6);
-- Create a recursive CTE to create an organization tree
WITH RecursiveOrg AS (
  SELECT EmployeeID, EmployeeName, ManagerID, 0 AS Level
  FROM Employees
  WHERE ManagerID IS NULL
  UNION ALL
  SELECT E.EmployeeID, E.EmployeeName, E.ManagerID, R.Level + 1
  FROM Employees AS E
  JOIN RecursiveOrg AS R ON E.ManagerID = R.EmployeeID
)
-- Query organization tree
SELECT EmployeeID, EmployeeName, ManagerID, Level
FROM RecursiveOrg
ORDER BY Level, EmployeeID;

In there:

“Anchor members” include rows in the Employees table whose ManagerID is NULL. This corresponds to the CEO – which represents the starting point of the recursive process.

“Recursive member” contains all employees (including ranks) in the company organization – created by combining information in the Employees table and based on the ManagerID relationship to determine the rank for each staff.

CTE RecursiveOrg starts at the CEO and then iterates through the ranks of the organizational structure. The result is a table containing all employees with information about their rank in the organization.

Some advanced applications of CTEs in analysis

In this article, we will see an example with the following table and data information:

Information of tables:

CREATE TABLE sales_data (
    order_id INT PRIMARY KEY,
    order_date DATE,
    customer_id VARCHAR(100),
    total_amount DECIMAL(10, 2)
);
 
CREATE TABLE relationships (
    user_id INT,
    friend_id INT
);
 
CREATE TABLE text_data (
    text_id INT PRIMARY KEY,
    text_content TEXT
);

Data in the tables:

INSERT INTO sales_data (order_id, order_date, customer_id, total_amount)
VALUES
    (1, '2023-01-15', 'Cust123', 100.50),
    (2, '2023-01-18', 'Cust456', 75.25),
    (3, '2023-02-10', 'Cust789', 150.00),
    (4, '2023-02-15', 'Cust123', 200.75),
    (5, '2023-03-05', 'Cust456', 50.00),
    (6, '2023-03-20', 'Cust789', 125.50),
    (7, '2023-04-10', 'Cust123', 90.25),
    (8, '2023-04-15', 'Cust456', 300.00),
    (9, '2023-05-02', 'Cust789', 75.75),
    (10, '2023-05-20', 'Cust123', 150.00);
 
INSERT INTO relationships (user_id, friend_id)
VALUES
    (1, 2),
    (1, 3),
    (2, 4),
    (3, 5),
    (4, 6),
    (5, 7),
    (6, 8),
    (7, 9),
    (8, 10),
    (9, 10);
 
INSERT INTO text_data (text_id, text_content)
VALUES
    (1, 'Common Table Expressions (CTE) is a powerful tool in SQL.'),
    (2, 'CTE allows you to create a temporary table name and use it in a query.'),
    (3, 'CTE can be used to build complex queries and calculations.'),
    (4, 'Using CTE to analyze and transform text data is a common application.');

Clean and transform data

With CTEs, users can create complex data cleaning and transformation steps to cater to each data analysis need.

For example: Remove duplicate lines, replace null values, and calculate complex indices.

WITH CleanedSalesData AS (
    -- CTE for data cleaning
    SELECT
        order_id,
        customer_id,
        TO_DATE(order_date, 'YYYY-MM-DD') AS cleaned_order_date,
        CASE
            WHEN total_amount < 0 THEN 0
            ELSE total_amount
        END AS cleaned_total_amount
    FROM
        sales_data
)
-- Main query uses cleaned data
SELECT
    order_id,
    customer_id,
    cleaned_order_date,
    cleaned_total_amount
FROM
    CleanedSalesData
WHERE
    cleaned_total_amount > 0;

Specifically, the CleanedSalesData CTE will do the following:

Use the TO_DATE function to normalize the value in the order_date column into a standard date format.

Besides, the CASE…WHEN clause ensures that the total_amount column value is never negative. If the value is less than 0, a value of 0 will be assigned.

Finally, in the main query, we will retrieve data from the CleanedSalesData CTE to retrieve rows with a cleaned_total_amount value greater than 0.

Time series analysis

Consider the sales_date table, with columns order_id, order_date, and total_amount

Problem to be solved: Use CTE to analyze total sales by month and average total monthly sales for the year.

WITH MonthlySalesData AS (
    -- CTE for time series analysis
    SELECT
        EXTRACT(YEAR FROM order_date) AS year,
        EXTRACT(MONTH FROM order_date) AS month,
        SUM(total_amount) AS monthly_total
    FROM
        sales_data
    GROUP BY
        year,
        month
),
AverageMonthlySales AS (
    -- CTE to calculate the total average monthly sales for the year
    SELECT
        year,
        AVG(monthly_total) AS average_monthly_sales
    FROM
        MonthlySalesData
    GROUP BY
        year
)
-- Main query uses analyzed data
SELECT
    year,
    month,
    monthly_total,
    average_monthly_sales
FROM
    MonthlySalesData
JOIN
    AverageMonthlySales ON MonthlySalesData.year = AverageMonthlySales.year
ORDER BY
    year,
    month;

Specifically:

CTE MonthlySalesData will analyze data by year and month, calculating total monthly sales.

CTE AverageMonthlySales will calculate the total average monthly sales for the year using data from MonthlySalesData.

Finally, in the main query, we will combine data from the above 2 CTEs to solve the above analysis need.

Analyze text strings

Consider the text_data table containing text segments.

Problem to solve: Use CTE to find words that appear multiple times in those paragraphs of text.

WITH WordCounts AS (
    -- CTE to count word occurrences
    SELECT
        text_id,
        word,
        COUNT(*) AS count
    FROM (
        SELECT
            text_id,
            regexp_split_to_table(text_content, E'\\s+') AS word
        FROM
            text_data
    ) AS words
    GROUP BY
        text_id, word
)
-- The main query uses the analyzed data
SELECT
    text_id,
    word,
    count
FROM
    WordCounts
ORDER BY
    text_id, count DESC;

Specifically:

CTE WordCounts has the function of counting words appearing in a paragraph of text combined with the regexp_split_to_table function to split paragraphs of text into individual words.

In the main query, we will retrieve data from the WordCounts CTE to get the result of the word occurrence count.

Last updated