Window function
Last updated
Last updated
Windows functions in SQL are used to perform row calculations relative to the current row. Unlike Aggregate Functions which calculates all rows, Windows Functions is used to calculate row by row. A Windows Function is defined when an OVER() clause follows the function call.
Specifically:
columnname1 is the first column name you want to select.
{window_function}is the name of an aggregate function such as sum, avg, count, row_number, rank, or dense_rank.
columnname2 is the name of the column to which you apply the window function.
columnname3 is the third column name, which will form the basis for the partition.
new_column is a label for a new column that you can apply using the AS keyword.
table_name is the name of the source table.
Partition by: Used to group related rows to perform calculations
Order By: Used to sort the rows in each partition
Window functions are a different type of query than basic SQL statements. Unlike SQL aggregate functions, you can use Window Functions to implement advanced query functions.
Aggregate Functions
AVG()
Returns the average value
COUNT()
Count values
MAX()
Returns the maximum value
MIN()
Returns the smallest value
SUM()
Calculate the total values
Ranking Functions
RANK()
Ranks the values in ascending order but will return the same rank for the same values and ignore that rank Example: rank(): 1,1,3,4,5
DENSE_RANK()
Ranks the values in ascending order but will return the same rank for the same values and does not ignore the rank Example: dense_rank(): 1,1,2,3,4
ROW_NUMBER()
Rank the values in each partition in ascending order without regard to similarity. Example: row_number(): 1,2,3,4,5
CUME_DIST()
Calculate the proportion of values that are less than or equal to the current value
PERCENT_RANK()
(rank -1)/ (row-1) In which: + rank is the order of that value in ascending order (same values return the same rank) + row: total number of rows (considered in 1 partition)
Analytic Functions
FIRST_VALUE (expression)
Get the initial value in each partition
LAST_VALUE (expression)
Get the last value in each partition
LAG (expression, offset)
Sorts values in ascending order and returns non-ignored values. Where: offset: number of omitted values from top to bottom (If this parameter is omitted, the default is 1)
LEAD (expression, offset)
Sorts values in descending order and returns non-ignored values. Where: offset: number of omitted values from top to bottom (If this parameter is omitted, the default is 1)
For example:
Query the list of customers with total monthly revenue ranked from high to low in the system for each month.
You can use the CREATE TABLE command to create a new table in SQL. Below is a sample dataset that this tutorial will use to define some window functions:
Order Date
Category
Color
Sale Price
Quantity
08-11-2016
Phones
Black
907.152
6
12-06-2016
Binders
Green
18.504
3
11-10-2015
Appliances
Yellow
114.9
5
11-10-2015
Tables
Brown
1706.184
9
09-06-2014
Phones
Red
911.424
4
09-06-2014
Paper
White
15.552
3
09-06-2014
Binders
Black
407.976
3
09-06-2014
Appliances
Yellow
68.81
5
09-06-2014
Binders
Green
2.544
3
09-06-2014
Storage
Orange
665.88
6
09-06-2014
Storage
Orange
55.5
2
15-04-2017
Phones
Black
213.48
3
05-12-2016
Binders
Green
22.72
4
22-11-2015
Appliances
Green
60.34
7
22-11-2015
Chairs
Dark Brown
71.372
2
13-05-2014
Furniture
Orange
190.92
5
Sum function Suppose you want to calculate the total sales for each value in a directory column. Here's how you can do this:
In the code above, the SQL command retrieves categories and colors from the original dataset. The sum function adds the sale_price column. It does this by category because the OVER clause determines the order by category column. The final result is as follows:
Like the sum function, you can average each row of data using the avg function. Instead of a total, you'll have a column containing average revenue.
Similar to the sum and avg functions, the window function count in SQL is quite simple and works like the other two functions. When you switch to the count function, you will get the total number of each value in the new column. Here's how you can calculate the total:
row_number() works a little differently than the window functions mentioned above. The row_number() function assigns a row number to each row, depending on the order of the clause. The starting row number is 1. row_number assigns a corresponding value to each row until the end.
Here is the basic structure of a row_number() function:
But what happens if you want to assign separate row numbers to each item in the catalog? The above syntax sets a rotating serial number, regardless of the items stored in the catalog.
For example, equipment lists need to be numbered separately… You can use the partition function to perform this simple but practical task. The partition keyword assigns a specified number of rows to each item in the catalog.
The rank() function works differently than the row_number() function. You need to specify the column name in order by function, to use it as the basis for determining the function value. For example, in the code below, you can use the color column in the order by function. This query then uses that order to assign a ranking value to each row.
You can use the code syntax below to pass a ranking function in SQL:
The order by function classifies the color folder, while the rank function ranks each color. However, all similar color values share the same rating, while other colors have their own ratings. Black appears 3 times in the dataset; Instead of assigning a rating value of 1, 2, 3, black items are rated 1.
However, Brown would be 4, not 2. The Rank function ignores the values and assigns chronological values to different items. If you want to assign a more meaningful ranking value, you can use the dense_rank() function.
The dense_rank function does not ignore any rank value in the order by function. For example, the first 3 color entries will have rank 1. However, the next color (Brown) will not have rank 4, but rank 2, which is the next chronological order in the numbered category. The dense_rank function is a more practical window function because it assigns a meaningful value to all list items.
Here's how you can use the dense_rank function in SQL: