Spark SQL Aggregate Function

  • Last update: January 23, 2025
  • Overview

    This document introduces the syntax of the aggregate functions in Spark SQL.

    COUNT

    The source table content is shown in the following figure.

     1.pngcount(*): Counts the number of rows retrieved, including rows with null values.                        

    You can use the following statement in Spark SQL to obtain the number of retrieved rows, as shown in the following figure.

    select count(*) from DB Table Input

    2.png

    iconNote:
    Specify the data table and the field to be queried in SQL Statement by selecting them instead of entering their names.

    count(Expression): Returns the number of non-null values of the Expression field.

    You can use the following statement in Spark SQL to obtain the number of non-null values of the Ship City field, as shown in the following figure.

    select count(Ship City) from DB Table Input-1

    3.png

    count(DISTINCT Expression 1 [, Expression2]): Returns the number of rows with different non-null expression values.

    You can use the statement in Spark SQL to obtain the number of unique non-null values of the Ship City field, as shown in the following figure.

    select count(distinct Ship City) from DB Table Input-2

    4.png

    SUM

    The source table content is shown in the following figure.

    6.png

    sum(Column): Returns the sum of all values in the specified column.

    You can use the following statement in Spark SQL to obtain each shipper's freight sum, as shown in the following figure.

    select Shipper,sum(Freight
    from DB Table Input-3
    group by Shipper

    5.png

    1737602889622793.png

    iconNote:
    Specify the data table and the field to be queried in SQL Statement by selecting them instead of entering their names.

    sum (DISTINCT Column): Returns the sum of unique values in the specified column.

    You can use the following statement in Spark SQL to obtain the sum of unique Freight values of each shipper, as shown in the following figure.

    select Shipper,sum(distinct Freight)
    from DB Table Input-4
    group by Shipper

     8.png


    The effect is shown in the following figure.

    9.png

    AVG

    The source table content is shown in the following figure.

    10.png


    avg(Column): Returns the mean of all values in the specified column.

    You can use the following statement in Spark SQL to obtain the freight mean of each shipper, as shown in the following figure.

    select Shipper, avg(Freight)
    from DB Table Input-5
    group by Shipper
    iconNote:
    Specify the data table and the field to be queried in SQL Statement by selecting them instead of entering their names.

    11.png

    avg(DISTINCT Column): Returns the mean of unique values in the specified column.

    You can use the following statement in Spark SQL to obtain the mean of unique Freight values of each shipper, as shown in the following figure.

    select Shipper, avg(distinct Freight)
    from DB Table Input-6
    group by Shipper

    12.png

    MIN

    The source table content is shown in the following figure.

    13.png

    min(Column): Returns the minimum value in the specified column.

    You can use the following statement in Spark SQL to obtain the minimum freight, as shown in the following figure.

    select min() from
    iconNote:
    Specify the data table and the field to be queried in SQL Statement by selecting them instead of entering their names.


    MAX

    The source table content is shown in the following figure.

    13.png

    max (Column): Returns the maximum value in the specified column.

    You can use the following statement in Spark SQL to obtain the maximum freight, as shown in the following figure.

    select max(Freight) from DB Table Input-8
    iconNote:
    Specify the data table and the field to be queried in SQL Statement by selecting them instead of entering their names.

    17.png

    VARIANCE

    The source table content is shown in the following figure.

    16.png

    variance(Column): Returns the sample variance of the values in the specified column, commonly used in statistics.

    The result of the following statement is shown in the following figure. 

    select variance(Unit Price),variance(Cost Price) from DB Table Input-9

    17.png

    var_pop(Column): Returns the population variance of the values in the specified column.

    The result of the following statement is shown in the following figure. 

    select var_samp(Unit Price),var_samp(Cost Price) from DB Table Input-10

    18.png

    var_samp(col): Returns the sample variance of the values in the specified column.

    The result of the following statement is shown in the following figure. 

    select var_samp(Unit Price),var_samp(Cost Price) from DB Table Input-11

    STDDEV_POP

    The source table content is shown in the following figure.

    16.png

    stddev_pop(Column): Returns the population standard deviation of the values in the specified column.

    The result of the following statement is shown in the following figure. 

    select stddev_pop(Unit Price),stddev_pop(Cost Price) from DB Table Input-12

    20.png

    stddev_samp(Column): Returns the sample standard deviation of the values in the specified column.

    The result of the following statement is shown in the following figure. 

    select stddev_samp(Unit Price),stddev_samp(Cost Price) from DB Table Input-13

    COVAR_POP

    The source table content is shown in the following figure.

    16.png

    covar_pop(Column 1, Column 2): Returns the population covariance of the values in the specified column.

    The result of the following statement is shown in the following figure. 

    select covar_pop(Unit Price) ,(Cost Price) from DB Table Input-14

    covar_samp (col 1, col 2): Returns the sample covariance of the values in the specified column.

    The result of the following statement is shown in the following figure. 

    select covar_samp(Unit Price,Cost Price) from DB Table Input-15

    CORR

    The source table content is shown in the following figure.

    16.png

    corr(Column 1, Column 2): Returns the correlation coefficient between values in the two specified columns.

    The result of the following statement is shown in the following figure. 

    select corr(Unit Price,Cost Price) from DB Table Input-16

    PERCENTILE

    The source table content is shown in the following figure.

    16.png

    percentile(Column, Percentile): Returns an estimate for the specified nearest-rank percentile of values in the specified column.

    The column must be an integer column. The Percentile value should be within [0, 1]. For example, percentile(Unit Price, 0.5) returns the median of values in the Unit Price column.

    The result of the following statement is shown in the following figure. 

    select percentile(Unit Price,0.5) from DB Table Input-17

    collect_list

    This function is mainly used to generate arrays.

    The syntax is collect_list (Expression), where the Expression value is the field name. 

    Example: For instance, if you want to transform a JSON object column into an array column, you can combine concat_ws and collect_list.

    The content of the source object is shown in the following figure. (The table name is JSON Generation.)

    You can use the following statement to transform JSON_DATA data into array-format data, as shown in the following figure. 

    SELECT CONCAT("[",concat_ws(",",collect_list(JSON_DATA)),"]") as a from JSON Generation].
    iconNote:

    Specify the table in SQL Statement by clicking the button besides Input Source.

    22.png


     


    附件列表


    主题: Data Development - Scheduled Task
    Previous
    Next
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    10s後關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy