Window function

Windows functions in SQL are used to perform row calculations relative to the current row. Unlike Aggregate Functions which calculates all rows, Windows Functions is used to calculate row by row. A Windows Function is defined when an OVER() clause follows the function call.

Syntax

SELECT columnname1,
{window_function}(columnname2)
OVER([PARTITION BY columnname1] [ORDER BY columnname3]) AS new_column
FROM table_name;

Specifically:

  • columnname1 is the first column name you want to select.

  • {window_function}is the name of an aggregate function such as sum, avg, count, row_number, rank, or dense_rank.

  • columnname2 is the name of the column to which you apply the window function.

  • columnname3 is the third column name, which will form the basis for the partition.

  • new_column is a label for a new column that you can apply using the AS keyword.

  • table_name is the name of the source table.

  • Partition by: Used to group related rows to perform calculations

  • Order By: Used to sort the rows in each partition

Window functions are a different type of query than basic SQL statements. Unlike SQL aggregate functions, you can use Window Functions to implement advanced query functions.

Types of Window Functions

Aggregate Functions

AVG()

Returns the average value

COUNT()

Count values

MAX()

Returns the maximum value

MIN()

Returns the smallest value

SUM()

Calculate the total values

Ranking Functions

RANK()

Ranks the values in ascending order but will return the same rank for the same values and ignore that rank Example: rank(): 1,1,3,4,5

DENSE_RANK()

Ranks the values in ascending order but will return the same rank for the same values and does not ignore the rank Example: dense_rank(): 1,1,2,3,4

ROW_NUMBER()

Rank the values in each partition in ascending order without regard to similarity. Example: row_number(): 1,2,3,4,5

CUME_DIST()

Calculate the proportion of values that are less than or equal to the current value

PERCENT_RANK()

(rank -1)/ (row-1) In which: + rank is the order of that value in ascending order (same values return the same rank) + row: total number of rows (considered in 1 partition)

Analytic Functions

FIRST_VALUE (expression)

Get the initial value in each partition

LAST_VALUE (expression)

Get the last value in each partition

LAG (expression, offset)

Sorts values in ascending order and returns non-ignored values. Where: offset: number of omitted values from top to bottom (If this parameter is omitted, the default is 1)

LEAD (expression, offset)

Sorts values in descending order and returns non-ignored values. Where: offset: number of omitted values from top to bottom (If this parameter is omitted, the default is 1)

For example:

Query the list of customers with total monthly revenue ranked from high to low in the system for each month.

Prepare dataset

You can use the CREATE TABLE command to create a new table in SQL. Below is a sample dataset that this tutorial will use to define some window functions:

Order Date

Category

Color

Sale Price

Quantity

08-11-2016

Phones

Black

907.152

6

12-06-2016

Binders

Green

18.504

3

11-10-2015

Appliances

Yellow

114.9

5

11-10-2015

Tables

Brown

1706.184

9

09-06-2014

Phones

Red

911.424

4

09-06-2014

Paper

White

15.552

3

09-06-2014

Binders

Black

407.976

3

09-06-2014

Appliances

Yellow

68.81

5

09-06-2014

Binders

Green

2.544

3

09-06-2014

Storage

Orange

665.88

6

09-06-2014

Storage

Orange

55.5

2

15-04-2017

Phones

Black

213.48

3

05-12-2016

Binders

Green

22.72

4

22-11-2015

Appliances

Green

60.34

7

22-11-2015

Chairs

Dark Brown

71.372

2

13-05-2014

Furniture

Orange

190.92

5

Explain in detail the

Sum function Suppose you want to calculate the total sales for each value in a directory column. Here's how you can do this:

SELECT category, color,
sum(sale_price)
OVER (order by category) AS total_sales
FROM sahil.sample;

In the code above, the SQL command retrieves categories and colors from the original dataset. The sum function adds the sale_price column. It does this by category because the OVER clause determines the order by category column. The final result is as follows:

How to use window function Avg()

Like the sum function, you can average each row of data using the avg function. Instead of a total, you'll have a column containing average revenue.

SELECT category, color,
avg(sale_price)
OVER (order by category) AS avg_sales
FROM sahil.sample;
Hàm tổng hợp trong SQL

How to use window function Count()

Similar to the sum and avg functions, the window function count in SQL is quite simple and works like the other two functions. When you switch to the count function, you will get the total number of each value in the new column. Here's how you can calculate the total:

SELECT category, color,
count(category)
OVER (order by category) AS item_count
FROM sahil.sample;
Hàm Count() trong SQL

Window Function Row_Number()

row_number() works a little differently than the window functions mentioned above. The row_number() function assigns a row number to each row, depending on the order of the clause. The starting row number is 1. row_number assigns a corresponding value to each row until the end.

Here is the basic structure of a row_number() function:

SELECT category, color,
row_number()
OVER (order by category) AS item_number
FROM sahil.sample;
Hàm row_number trong SQL

But what happens if you want to assign separate row numbers to each item in the catalog? The above syntax sets a rotating serial number, regardless of the items stored in the catalog.

For example, equipment lists need to be numbered separately… You can use the partition function to perform this simple but practical task. The partition keyword assigns a specified number of rows to each item in the catalog.

SELECT category, color,
row_number()
OVER (partition by category order by category) AS item_number
FROM sahil.sample;
Cách dùng window function sql

Rank() and Dense_Rank() functions

The rank() function works differently than the row_number() function. You need to specify the column name in order by function, to use it as the basis for determining the function value. For example, in the code below, you can use the color column in the order by function. This query then uses that order to assign a ranking value to each row.

You can use the code syntax below to pass a ranking function in SQL:

SELECT category, color,
rank()
OVER (order by color) AS item_rank
FROM sahil.sample;
Hàm Rank trong SQL

The order by function classifies the color folder, while the rank function ranks each color. However, all similar color values share the same rating, while other colors have their own ratings. Black appears 3 times in the dataset; Instead of assigning a rating value of 1, 2, 3, black items are rated 1.

However, Brown would be 4, not 2. The Rank function ignores the values and assigns chronological values to different items. If you want to assign a more meaningful ranking value, you can use the dense_rank() function.

The dense_rank function does not ignore any rank value in the order by function. For example, the first 3 color entries will have rank 1. However, the next color (Brown) will not have rank 4, but rank 2, which is the next chronological order in the numbered category. The dense_rank function is a more practical window function because it assigns a meaningful value to all list items.

Here's how you can use the dense_rank function in SQL:

SELECT category, color,
dense_rank()
OVER (order by color) AS item_rank
FROM sahil.sample;
dense_rank() trong SQL

Last updated