# CTE

Common Table Expressions (CTEs) are an important feature in SQL that allows users to name and use a temporary table within the scope of a specific query. CTEs are used in processing complex queries, often combined with SELECT, INSERT, UPDATE or DELETE statements to help increase clarity, readability and management of SQL code.

```sql
WITH cte_name (column1, column2, ...) AS (
    -- CTE query
    SELECT ...
    FROM ...
    WHERE ...
)

-- Using CTE 
SELECT ...
FROM cte_name
WHERE ...
```

## Compare CTE with another technique

### CTE vs Sub-queries

<table><thead><tr><th width="159"></th><th>CTEs</th><th>Sub-queries</th></tr></thead><tbody><tr><td>Declare</td><td>Determination with the WITH</td><td>Determination clause in another query, enclosed in parentheses</td></tr><tr><td>Readability</td><td>Easy to read and understand, easy to maintain because it is separated into small components, separate from the main query</td><td>Can make the query complex, can cause confusion for the purpose of the entire paragraph code</td></tr><tr><td>Reusability</td><td>Can be reused multiple times within the same query, available until the next query is executed</td><td>Single use within the query, where declared</td></tr><tr><td>Application</td><td>Used in complex queries, divided into smaller components, easy to read and understand, manage code </td><td>Used in case of simple queries, no need to reuse code many times</td></tr></tbody></table>

### CTEs vs Temp table

|                            | CTEs                                                                                                                                 | Temp Table                                                                                                                                                                       |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Declare                    | Defined with the WITH clause                                                                                                         | Defined with the symbol '#'                                                                                                                                                      |
| Readability                | Is a temporary result, not stored in the database                                                                                    | Is a physical object, created and temporarily stored in the database                                                                                                             |
| Reusability and Constraint | Can be reused multiple times within the same query, available until the next query is executed                                       | Allows indexing, editing, and can be accessed from multiple queries in the same session                                                                                          |
| Application                | Use when users care about readability and reusability, ease of maintenance, do not need to store data outside the scope of the query | Use when need to store intermediate results for many query, have a need for indexing for better performance, or need to modify data in a table to serve different query purposes |

## Types of CTEs in SQL&#x20;

### Non-Recursive CTEs&#x20;

Non-recursive CTEs are used to simplify complex queries, making it easier for users to read and understand the code. They act as temporary result sets, allowing users to logically break down and order different parts of a query, which can be referenced later in the same query.

### Recursive CTEs&#x20;

Recursive CTEs are a powerful tool that allows processing hierarchical or repetitive data structures, such as traversing a genealogy tree or finding the shortest path in a graph. Recursive CTEs are divided into two parts: the initial, non-recursive part (called the “anchor member”), and the recursive part (called the “recursive member”), both of which refer to themselves. CTE.

**For example:** a recursive CTE application searches for dependencies in a company's organizational structure:

```sql
CREATE TABLE Employees (
  EmployeeID INT PRIMARY KEY,
  EmployeeName VARCHAR(255),
  ManagerID INT
);
 
INSERT INTO Employees (EmployeeID, EmployeeName, ManagerID)
VALUES (1, 'CEO', NULL),
       (2, 'VP of Sales', 1),
       (3, 'Sales Manager', 2),
       (4, 'Sales Representative', 3),
       (5, 'VP of Engineering', 1),
       (6, 'Engineering Manager', 5),
       (7, 'Software Engineer', 6);
-- Create a recursive CTE to create an organization tree
WITH RecursiveOrg AS (
  SELECT EmployeeID, EmployeeName, ManagerID, 0 AS Level
  FROM Employees
  WHERE ManagerID IS NULL
  UNION ALL
  SELECT E.EmployeeID, E.EmployeeName, E.ManagerID, R.Level + 1
  FROM Employees AS E
  JOIN RecursiveOrg AS R ON E.ManagerID = R.EmployeeID
)
-- Query organization tree
SELECT EmployeeID, EmployeeName, ManagerID, Level
FROM RecursiveOrg
ORDER BY Level, EmployeeID;
```

**In there:**

“Anchor members” include rows in the Employees table whose ManagerID is NULL. This corresponds to the CEO – which represents the starting point of the recursive process.

“Recursive member” contains all employees (including ranks) in the company organization – created by combining information in the Employees table and based on the ManagerID relationship to determine the rank for each staff.

CTE RecursiveOrg starts at the CEO and then iterates through the ranks of the organizational structure. The result is a table containing all employees with information about their rank in the organization.

## Some advanced applications of CTEs in analysis&#x20;

In this article, we will see an example with the following table and data information:

**Information of tables:**

```sql
CREATE TABLE sales_data (
    order_id INT PRIMARY KEY,
    order_date DATE,
    customer_id VARCHAR(100),
    total_amount DECIMAL(10, 2)
);
 
CREATE TABLE relationships (
    user_id INT,
    friend_id INT
);
 
CREATE TABLE text_data (
    text_id INT PRIMARY KEY,
    text_content TEXT
);
```

**Data in the tables:**

```sql
INSERT INTO sales_data (order_id, order_date, customer_id, total_amount)
VALUES
    (1, '2023-01-15', 'Cust123', 100.50),
    (2, '2023-01-18', 'Cust456', 75.25),
    (3, '2023-02-10', 'Cust789', 150.00),
    (4, '2023-02-15', 'Cust123', 200.75),
    (5, '2023-03-05', 'Cust456', 50.00),
    (6, '2023-03-20', 'Cust789', 125.50),
    (7, '2023-04-10', 'Cust123', 90.25),
    (8, '2023-04-15', 'Cust456', 300.00),
    (9, '2023-05-02', 'Cust789', 75.75),
    (10, '2023-05-20', 'Cust123', 150.00);
 
INSERT INTO relationships (user_id, friend_id)
VALUES
    (1, 2),
    (1, 3),
    (2, 4),
    (3, 5),
    (4, 6),
    (5, 7),
    (6, 8),
    (7, 9),
    (8, 10),
    (9, 10);
 
INSERT INTO text_data (text_id, text_content)
VALUES
    (1, 'Common Table Expressions (CTE) is a powerful tool in SQL.'),
    (2, 'CTE allows you to create a temporary table name and use it in a query.'),
    (3, 'CTE can be used to build complex queries and calculations.'),
    (4, 'Using CTE to analyze and transform text data is a common application.');
```

### Clean and transform data&#x20;

With CTEs, users can create complex data cleaning and transformation steps to cater to each data analysis need.

**For example:** Remove duplicate lines, replace null values, and calculate complex indices.

```sql
WITH CleanedSalesData AS (
    -- CTE for data cleaning
    SELECT
        order_id,
        customer_id,
        TO_DATE(order_date, 'YYYY-MM-DD') AS cleaned_order_date,
        CASE
            WHEN total_amount < 0 THEN 0
            ELSE total_amount
        END AS cleaned_total_amount
    FROM
        sales_data
)
-- Main query uses cleaned data
SELECT
    order_id,
    customer_id,
    cleaned_order_date,
    cleaned_total_amount
FROM
    CleanedSalesData
WHERE
    cleaned_total_amount > 0;
```

Specifically, the **CleanedSalesData** CTE will do the following:

Use the **TO\_DATE** function to normalize the value in the **order\_date** column into a standard date format.

Besides, the **CASE…WHEN** clause ensures that the **total\_amount** column value is never negative. If the value is less than 0, a value of 0 will be assigned.

Finally, in the main query, we will retrieve data from the **CleanedSalesData** CTE to retrieve rows with a **cleaned\_total\_amount** value greater than 0.

### Time series analysis&#x20;

Consider the **sales\_date** table, with columns **order\_id**, **order\_date**, and **total\_amount**

Problem to be solved: Use CTE to analyze total sales by month and average total monthly sales for the year.

```sql
WITH MonthlySalesData AS (
    -- CTE for time series analysis
    SELECT
        EXTRACT(YEAR FROM order_date) AS year,
        EXTRACT(MONTH FROM order_date) AS month,
        SUM(total_amount) AS monthly_total
    FROM
        sales_data
    GROUP BY
        year,
        month
),
AverageMonthlySales AS (
    -- CTE to calculate the total average monthly sales for the year
    SELECT
        year,
        AVG(monthly_total) AS average_monthly_sales
    FROM
        MonthlySalesData
    GROUP BY
        year
)
-- Main query uses analyzed data
SELECT
    year,
    month,
    monthly_total,
    average_monthly_sales
FROM
    MonthlySalesData
JOIN
    AverageMonthlySales ON MonthlySalesData.year = AverageMonthlySales.year
ORDER BY
    year,
    month;
```

Specifically:

CTE **MonthlySalesData** will analyze data by year and month, calculating total monthly sales.

CTE **AverageMonthlySales** will calculate the total average monthly sales for the year using data from **MonthlySalesData**.

Finally, in the main query, we will combine data from the above 2 CTEs to solve the above analysis need.

### Analyze text strings&#x20;

Consider the text\_data table containing text segments.

**Problem to solve:** Use CTE to find words that appear multiple times in those paragraphs of text.

```sql
WITH WordCounts AS (
    -- CTE to count word occurrences
    SELECT
        text_id,
        word,
        COUNT(*) AS count
    FROM (
        SELECT
            text_id,
            regexp_split_to_table(text_content, E'\\s+') AS word
        FROM
            text_data
    ) AS words
    GROUP BY
        text_id, word
)
-- The main query uses the analyzed data
SELECT
    text_id,
    word,
    count
FROM
    WordCounts
ORDER BY
    text_id, count DESC;
```

Specifically:

CTE **WordCounts** has the function of counting words appearing in a paragraph of text combined with the **regexp\_split\_to\_table** function to split paragraphs of text into individual words.

In the main query, we will retrieve data from the **WordCounts** CTE to get the result of the word occurrence count.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://huy312100.gitbook.io/software-development/dbms/advanced/cte.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
