CTE
Common Table Expression
Common Table Expressions (CTEs) are an important feature in SQL that allows users to name and use a temporary table within the scope of a specific query. CTEs are used in processing complex queries, often combined with SELECT, INSERT, UPDATE or DELETE statements to help increase clarity, readability and management of SQL code.
Compare CTE with another technique
CTE vs Sub-queries
Declare
Determination with the WITH
Determination clause in another query, enclosed in parentheses
Readability
Easy to read and understand, easy to maintain because it is separated into small components, separate from the main query
Can make the query complex, can cause confusion for the purpose of the entire paragraph code
Reusability
Can be reused multiple times within the same query, available until the next query is executed
Single use within the query, where declared
Application
Used in complex queries, divided into smaller components, easy to read and understand, manage code
Used in case of simple queries, no need to reuse code many times
CTEs vs Temp table
Declare
Defined with the WITH clause
Defined with the symbol '#'
Readability
Is a temporary result, not stored in the database
Is a physical object, created and temporarily stored in the database
Reusability and Constraint
Can be reused multiple times within the same query, available until the next query is executed
Allows indexing, editing, and can be accessed from multiple queries in the same session
Application
Use when users care about readability and reusability, ease of maintenance, do not need to store data outside the scope of the query
Use when need to store intermediate results for many query, have a need for indexing for better performance, or need to modify data in a table to serve different query purposes
Types of CTEs in SQL
Non-Recursive CTEs
Non-recursive CTEs are used to simplify complex queries, making it easier for users to read and understand the code. They act as temporary result sets, allowing users to logically break down and order different parts of a query, which can be referenced later in the same query.
Recursive CTEs
Recursive CTEs are a powerful tool that allows processing hierarchical or repetitive data structures, such as traversing a genealogy tree or finding the shortest path in a graph. Recursive CTEs are divided into two parts: the initial, non-recursive part (called the “anchor member”), and the recursive part (called the “recursive member”), both of which refer to themselves. CTE.
For example: a recursive CTE application searches for dependencies in a company's organizational structure:
In there:
“Anchor members” include rows in the Employees table whose ManagerID is NULL. This corresponds to the CEO – which represents the starting point of the recursive process.
“Recursive member” contains all employees (including ranks) in the company organization – created by combining information in the Employees table and based on the ManagerID relationship to determine the rank for each staff.
CTE RecursiveOrg starts at the CEO and then iterates through the ranks of the organizational structure. The result is a table containing all employees with information about their rank in the organization.
Some advanced applications of CTEs in analysis
In this article, we will see an example with the following table and data information:
Information of tables:
Data in the tables:
Clean and transform data
With CTEs, users can create complex data cleaning and transformation steps to cater to each data analysis need.
For example: Remove duplicate lines, replace null values, and calculate complex indices.
Specifically, the CleanedSalesData CTE will do the following:
Use the TO_DATE function to normalize the value in the order_date column into a standard date format.
Besides, the CASE…WHEN clause ensures that the total_amount column value is never negative. If the value is less than 0, a value of 0 will be assigned.
Finally, in the main query, we will retrieve data from the CleanedSalesData CTE to retrieve rows with a cleaned_total_amount value greater than 0.
Time series analysis
Consider the sales_date table, with columns order_id, order_date, and total_amount
Problem to be solved: Use CTE to analyze total sales by month and average total monthly sales for the year.
Specifically:
CTE MonthlySalesData will analyze data by year and month, calculating total monthly sales.
CTE AverageMonthlySales will calculate the total average monthly sales for the year using data from MonthlySalesData.
Finally, in the main query, we will combine data from the above 2 CTEs to solve the above analysis need.
Analyze text strings
Consider the text_data table containing text segments.
Problem to solve: Use CTE to find words that appear multiple times in those paragraphs of text.
Specifically:
CTE WordCounts has the function of counting words appearing in a paragraph of text combined with the regexp_split_to_table function to split paragraphs of text into individual words.
In the main query, we will retrieve data from the WordCounts CTE to get the result of the word occurrence count.
Last updated