Spark SQL

  • Last update: January 17, 2025
  • Overview

    Version

    FineDataLink Version

    Functional Change

    4.1.6.4

    Supported to_json and struct functions. For details, see Common SparkSQL Syntax.

    4.1.7.3

    Optimized the interaction.

    Application Scenario

    • The common practice of processing data from non-relational databases, such as data from files and APIs, is to store data in the database and then process it with SQL statements, which leads to redundant data in the target database. However, by using the Spark SQL operator, you can perform operations such as left joins, right joins, union all, grouping, summary, and sorting on data from various sources within FineDataLink.

    • Additionally, you can use the Spark SQL operator to handle various data processing needs that the visual operators in Data Transformation fail to meet.

    • When authentication is required to connect to the API data source, you can use Spark SQL encryption functions and variables to obtain tokens.

    Function Description

    You can use the Spark SQL operator to query and process data output by the upstream node/operator and output the result data to the downstream node/operator.

    The Spark SQL operator can be used as an input operator, where you can use parameters or constants for encryption and authentication.

    1.png

    Function Description

    1. Spark SQL can be used as an input operator,  where you can use encryption functions and parameters.

    Reference a parameter in a format similar to select '${cyctime}' as date.

    2.png

    2. The Spark SQL operator supports standard SQL statements, and the usage is consistent with that of SQL in Configuration Method under Data Synchronization > Data Source. If you want to use statements such as UPDATE, use the SQL Script node.

    3. For details about the common syntax and introduction of Spark SQL, see Spark SQL Syntax Overview.

    4. The setting items on the Spark SQL setting page are described in the following table.

    Tab Page

    Setting Item

    Introduction

    Configuration

    Input Source

    The preceding operator of the Spark SQL operator serves as the input table by default, with its name being the input table name. The Spark SQL operator supports two types of preceding operators, including: Data input operators: such as DB Table Input, API Input, and Dataset Input Connection and transformation operators: such as Data Association, Row to Column, Column to Row, and JSON Parsing.

    SQL Statement

    You need to enter the statement.

    The auto-completion feature is available when you enter the SQL statement. For example, when you type the table name followed by a period (.), the field names will be suggested.

    3.png

    If there are many fields, you can scroll through field names (shown in the red box in the following figure) to select fields.

    4.png

    Data Preview

    /

    After finishing the configuration, you can click the Data Preview tab to preview data after being processed by Spark SQL.

    5. On the Data Preview tab page, at most 5000 rows of data are sampled for the calculation to avoid server overload due to the excessive data volume. All the selected data participates in the calculation during actual execution.

    Example

    You want to associate data from the SOrder file and the SCustomer database table and filter data.

    You can download the example data:

    Creating a Task

    Create a scheduled task and drag a Data Transformation node onto the design page, as shown in the following figure. 

    5.png


    Configuring the Data Input

    Click the Data Transformation node to enter the editing page. Upload the local Excel file to the FineDataLink system, as shown in the following figure. For details, see Function Description of File Input.

    6.png

    Add a DB Table Input operator and use the SQL statement to fetch all the data from the SCustomer table, as shown in the following figure. 

    7.png

    Associating Data

    Add a Data Association operator and set Join Method to Left Join to combine the SOrder table and the SCustomer table based on the Customer ID field, as shown in the following figure.

    8.png

    Configuring the Spark SQL Operator

    Add a Spark SQL operator and connect it to the Data Association operator as the downstream operator. Enter the following statement in SQL Statement.

    select * FROM Data Association where Data Association.Fax is not null and Data Association.Ship City = 'Beijing'

    iconNote:
    You cannot copy and paste the above statement. The input table name is the name of the preceding operator (which is Data Association in this example). You can specify the input table by clicking the upper button or entering the table name and pressing the Enter key.

    9.png

    Note the requirement of writing the SQL statement, as shown in the following figure.

    10.png

    Click Data Preview, as shown in the following figure.

    11.png

    Configuring the Field Setting Operator

    Add a Field Setting operator, where you can delete and modify the field name and the data type, as shown in the following figure.

    12.png

    Configuring the DB Table Output Operator

    Drag another DB Table Output operator to the design page and connect it to the Spark SQL operator as the downstream operator.

    Click the DB Table Output operator to configure it, as shown in the following figure. 

    13.png

    Running the Task

    1. Click the Save button in the upper right corner. Click Run in the upper right corner, as shown in the following figure. After successful execution, a message indicating successful execution will be displayed in Log

    14.png

    You can see that the out_order table has been added to the database, as shown in the following figure.

    15.png

    附件列表


    主题: Data Development - Scheduled Task
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    9s后關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy