Window functions are supported in the Spark SQL operator of FineDataLink.
Window functions are online analytical processing (OLAP) functions.
Window functions share similar functionality with Spark SQL aggregate functions. For details, see Spark SQL Aggregate Function. Both can be used to conduct statistical analysis on a specified group of rows (known as a window), yet they differ in several ways.
Window functions return a value for each row from a query as they have specified the window size, which may change as the current row changes, enabling them to operate on a set of rows in the sliding window. In contrast, aggregate functions return only one row because they group query rows into a single result row.
Aggregation functions fail to extract detailed data of columns other than the grouping column along with the aggregate value simultaneously whereas window functions can easily achieve this.
PARTITION BY: Divides data into partitions by the specified dimension. If PARTITION BY is omitted, query rows are ungrouped.
ORDER BY: Sorts data in each partition by the specified field. If ORDER BY is omitted, partition rows are unordered.
ROWS BETWEEN: Specifies both window endpoints. If ROWS BETWEEN is omitted, the default range is from the start point to the current row.
PRECEDING: Refers to the rows before the current row.
FOLLOWING: Refers to the rows after the current row.
CURRENT ROW: Refers to the current row.
UNBOUNDED: Refers to the start point.
UNBOUNDED PRECEDING: Specifies that the window starts at the first partition row.
UNBOUNDED FOLLOWING: Specifies that the window ends at the last partition row.
The source table is shown in the following figure.
Use the following statement in the Spark SQL operator.
select Order ID,Product ID,Unit Price, SUM(Unit Price) OVER(PARTITION BY Product ID ORDER BY Order ID) AS t1,SUM(Unit Price) OVER(PARTITION BY Product ID ORDER BY Order ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS t2,SUM(Unit Price) OVER(PARTITION BY Product ID) AS t3,SUM(Unit Price) OVER(PARTITION BY Product ID ORDER BY Order ID ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS t4,SUM(Unit Price) OVER(PARTITION BY Product ID ORDER BY Order ID ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING) AS t5,SUM(Unit Price) OVER(PARTITION BY Product ID ORDER BY Order ID ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS t6 from DB Table Input
The following table describes the statement.
Statement
Meaning
SUM(Unit Price) OVER(PARTITION BY Product ID ORDER BY Order ID) AS t1
Sums the Unit Price values from the start point to the current row within each Product ID group.
SUM(Unit Price) OVER(PARTITION BY Product ID ORDER BY Order ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS t2
Sums the Unit Price values from the start point to the current row within each Product ID group. The result is the same as t1.
SUM(Unit Price) OVER(PARTITION BY Product ID) AS t3
Sums all Unit Price values within each Product ID group.
SUM(Unit Price) OVER(PARTITION BY Product ID ORDER BY Order ID ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS t4
Sums the Unit Price values on the current row and its three preceding rows within each Product ID group.
SUM(Unit Price) OVER(PARTITION BY Product ID order by Order ID ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING) AS t5
Sums the Unit Price values on the current row, its two preceding rows, and its one following row within each Product ID group.
SUM(Unit Price) OVER(PARTITION BY Product ID ORDER BY Order ID ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS t6
Sums the Unit Price values on the current row and all following rows within each Product ID group.
The usage of AVERAGE is similar to that of SUM. The following is a statement example.
select Order ID ,Product ID , Unit Price , avg(Unit Price) OVER(PARTITION BY Product ID ORDER BY Order ID) AS t1 from DB Table Input
The usage of MAX is similar to that of SUM.
The usage of MIN is similar to that of SUM.
NTILE(N): Divides the data rows in each partition into N buckets and returns the number of the bucket to which the current row belongs.
select Order ID, Product ID, Unit Price, NTILE(2) OVER(PARTITION BY Product ID ORDER BY Order ID) AS t1,NTILE(3) OVER(PARTITION BY Product ID ORDER BY Order ID) AS t2, NTILE(4) OVER(PARTITION BY Product ID ORDER BY Order ID) AS t3 from DB Table Input
The execution result is shown in the following figure.
NTILE(2) OVER(PARTITION BY Product ID ORDER BY Order ID) AS t1
Divides data within each Product ID group into two buckets.
NTILE(3) OVER(PARTITION BY Product ID ORDER BY Order ID) AS t2
Divides data within each Product ID group into three buckets.
NTILE(4) OVER(PARTITION BY Product ID ORDER BY Order ID) AS t3
Divides data within each Product ID group into four buckets.
ROW_NUMBER(): Returns the number of the current row within its partition. Rows numbers range from 1 to the number of partition rows.
Example:
select Order ID, Product ID, Unit Price, ROW_NUMBER() OVER(PARTITION BY Product ID ORDER BY Unit Price asc) AS t1 from DB Table Input
RANK(): Returns the rank of the current row within its partition, with gaps. Peers are considered ties and receive the same rank.
select Order ID, Product ID, Unit Price, RANK() OVER(PARTITION BY Product ID ORDER BY Unit Price desc) AS RANK from DB Table Input4
DENSE_RANK(): Returns the rank of the current row within its partition, without gaps. Peers are considered ties and receive the same rank.
select Order ID, Product ID, Unit Price, DENSE_RANK() OVER(PARTITION BY Product ID ORDER BY Unit Price desc) AS DENSE_RANK from DB Table Input5
The result is shown in the following figure.
LAG(Column name, N, Default value): Returns the column value from the row that precedes the current row by N row(s) within its partition.
N is optional and defaults to 1.
If there is no such row, the return value is Default value, or NULL if the default value is missing.
select Order ID, Product ID, Unit Price, ROW_NUMBER() OVER(PARTITION BY Product ID ORDER BY Unit Price) AS ROW_NUMBER,LAG(Unit Price,1,10) OVER(PARTITION BY Product ID ORDER BY Unit Price) AS last_1_time, LAG(Unit Price,1) OVER(PARTITION BY Product ID ORDER BY Unit Price) AS last_2_time from DB Table Input6
Description
LAG(Unit Price,1,10) OVER(PARTITION BY Product ID ORDER BY Unit Price) AS last_1_time
Returns the Unit Price value on the preceding row of the current row within each Product ID group, with a default value of 10.
LAG(Unit Price,1) OVER(PARTITION BY Product ID ORDER BY Unit Price) AS last_2_time
Returns the Unit Price value on the preceding row of the current row within each Product ID group, with no default value.
LEAD(Column name, N, Default value): Returns the column value from the row that follows the current row by N row(s) within its partition.
select Order ID, Product ID, Unit Price,LEAD(Unit Price,1,10) OVER(PARTITION BY Product ID ORDER BY Unit Price) AS last_1_time, LEAD(Unit Price,1) OVER(PARTITION BY Product ID ORDER BY Unit Price) AS last_2_time from DB Table Input7
Explanation:
LEAD(Unit Price,1,10) OVER(PARTITION BY Product ID ORDER BY Unit Price) AS last_1_time
Returns the Unit Price value on the following row of the current row within each Product ID group, with a default value of 10.
LEAD(Unit Price,1) OVER(PARTITION BY Product ID ORDER BY Unit Price) AS last_2_time
Returns the Unit Price value on the following row of the current row within each Product ID group, with no default value.
FIRST_VALUE(Column): Returns the Column value on the first row within the partition after sorting.
select Order ID, Product ID, Unit Price, FIRST_VALUE(Unit Price) OVER(PARTITION BY Product ID ORDER BY Order ID) AS FIRST_VALUE from DB Table Input8
LAST_VALUE(Column): Returns the Column value on the last row within the partition after sorting.
select Order ID, Product ID, Unit Price, LAST_VALUE(Unit Price) OVER(PARTITION BY Product ID ORDER BY Order ID) AS LAST_VALUE from DB Table Input9
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy