Common Spark SQL Syntax

  • Last update: February 11, 2025
  • Overview

    This document introduces common Spark SQL syntax.

    Common Operator

    Operator

    Meaning

    Detail

    =

    Equal operator

    A=B. Returns TRUE if the A value is equal to the B value. Returns FALSE otherwise.

    In FineDataLink, you can use Data Filtering instead.

    <> 

    Not equal operator

    A <> B. Returns NULL if the A value is NULL or the B value is NULL. Returns TRUE if the A value is unequal to the B value. Returns FALSE otherwise.

    In FineDataLink, you can use Data Filtering instead.

    Less than operator

    A < B. Returns NULL if the A value is NULL or the B value is NULL. Returns TRUE if the A value is less than the B value. Returns FALSE otherwise.

    In FineDataLink, you can use Data Filtering instead.

    <=

    Less than or equal operator

    A <= B. Returns NULL if the A value is NULL or the B value is NULL. Returns TRUE if the A value is less than or equal to the B value. Returns FALSE otherwise.

    In FineDataLink, you can use Data Filtering instead.

    >=

    Greater than or equal operator

    A >= B. Returns NULL if the A value is NULL or the B value is NULL. Returns TRUE if the A value is greater than or equal to the B value. Returns FALSE otherwise.

    In FineDataLink, you can use Data Filtering instead.

    IS NULL

    NULL value test

    A IS NULL. Returns TRUE if the A value is NULL. Returns FALSE otherwise.

    In FineDataLink, you can use Data Filtering instead.

    IS NOT NULL

    NOT NULL value test

    A IS NOT NULL. Returns FALSE if the A value is NULL. Returns TRUE otherwise.

    In FineDataLink, you can use Data Filtering instead.

    LIKE

    Fuzzy matching

    Syntax one: A LIKE B 

    Syntax two: LIKE (A, B) 

    Expression type: string 

    Return value type: boolean or NULL 

    Returns NULL if either the A value or the B value is NULL. 

    Returns TRUE if the A value matches the regular syntax of the expression B. Returns FALSE otherwise. With LIKE you can only use the percent sign (%) and the underscore (_) as the wildcard character in the expression B, where % matches any number of characters, even zero characters, and _ matches exactly one character. (You can use multiple underscores, with each representing a character.) 

    For example, select Football like %ba from Test and select like (Football, _otba%) from Test.

    FineDataLink makes fuzzy querying and filtering easier by visualizing the process, improving efficiency. For details, seeFuzzy Field Querying/Filtering.

    AND

    Logical AND

    A AND B.

    Returns TRUE if both the A value and the B value are TRUE. Returns FALSE otherwise.

    OR

    Logical OR

    A OR B.

    Returns TRUE if either the A value or the B value is TRUE. Returns FALSE otherwise.


    AS - Column Adding/Field Renaming

    Example:

    • Add a Type column with the Test value as the column value. You can use the following SQL statement. select Test AS Type from Input Source

    • Add a Mark column with the column value as 1 for data with a math score greater than 90 and as 0 otherwise. You can use the following SQL statement. select Math, if(Math >90,1,0) AS Mark from Input Source

    For a simpler solution, see Adding a Column Using the IF Function.

    Enclose the Chinese column name with single quotation marks (’‘). For example, select Test as Field.

    GROUP BY - Data Deduplication

    For details, see Data Deduplication.

    Sorting

    ORDER BY: global sorting

    SORT BY, DISTRIBUTE BY, and CLUSTER BY: partial sorting

    For details, see Data Sorting.

    LIMIT - Constraining the Number of Returned Rows

    Example:

    select * from DB Table Input limit 3

    2.png

    BETWEEN ... AND - Constraining the Number of Returned Rows

     

    Description: Returns records within a given range. It is applied to numeric, string, and date columns.

    Example:

    select * FROM DB Table Input where Product ID between 2 and 3

    3.png



    JOIN and UNION

    • LEFT JOIN

    • LEFT OUTER JOIN

    • LEFT SEMI JOIN

    • INNER JOIN or JOIN

    • RIGHT JOIN

    • RIGHT OUTER JOIN

    • FULL JOIN

    • FULL OUTER JOIN

    • UNION

    • UNION ALL

    For details, see JOIN and UNION.


    HAVING - Conditional Filtering After Grouping

    Filters records based on the condition in the HAVING clause after the GROUP BY clause groups data.

    Example:

    select Product ID FROM DB Table Input group by Product ID having Product ID > 400

    1739268118289984.png

    PIVOT - Row-to-Column Conversion

    You are advised to use the Row-to-Column operator.

    Stack - Column-to-Row Conversion

    You are advised to use the Column-to-Row operator.

    to_json and struct Functions

    The to_json and struct functions are supported in 4.1.6.4 and later versions.

    Difference Between the to_json Function and the JSON Generation operator

    The JSON Generation operator can convert a two-dimensional table into a JSON object or convert it into a two-dimensional table with multiple rows and a single column (where the value on each row is a JSON object). However, it cannot create a two-dimensional table containing a combination of plain text, JSON objects, and JSON arrays. The to_json function can be used more flexibly.

    The JSON Generation operator can be used with the API Output operator to generate JSON data and then output it to the API. The to_json function can be used to construct JSON fields for MongoDB Output.

    to_json Function

    The to_json function can convert data to a JSON string.

    SQL Statement in Spark SQL

    Result

    SELECT to_json(named_struct('a', 1, 'b',   2));

    {"a":1,"b":2}

    SELECT to_json(map('a', named_struct('b',   1)));

    {"a":{"b":1}}

    SELECT to_json(array(map('a', 1)));

    [{"a":1}]

    To convert text to a JSON array:

    SELECT to_json(split('a,b,c', ',')) AS   jsonArray

    ["a", "b",   "c"]

    struct Function

    The struct function is used to concatenate multiple columns or fields into a structured data object.

    Example one: Concatenate multiple fields of a two-dimensional table into a JSON object.

    5.png 

    Example two: Concatenate multiple fields of a two-dimensional table into a JSON object array through struct, to_json, collect_list, concat_ws, concat, and group by.For details, see MongoDB Output.

     


    附件列表


    主题: Data Development - Scheduled Task
    Previous
    Next
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    10s後關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy