Spark SQL- FineDataLink Help Document

Last update: January 17, 2025

Overview

Version

FineDataLink Version	Functional Change
4.1.6.4	Supported to_json and struct functions. For details, see Common SparkSQL Syntax.
4.1.7.3	Optimized the interaction.

Application Scenario

The common practice of processing data from non-relational databases, such as data from files and APIs, is to store data in the database and then process it with SQL statements, which leads to redundant data in the target database. However, by using the Spark SQL operator, you can perform operations such as left joins, right joins, union all, grouping, summary, and sorting on data from various sources within FineDataLink.
Additionally, you can use the Spark SQL operator to handle various data processing needs that the visual operators in Data Transformation fail to meet.
When authentication is required to connect to the API data source, you can use Spark SQL encryption functions and variables to obtain tokens.

Function Description

You can use the Spark SQL operator to query and process data output by the upstream node/operator and output the result data to the downstream node/operator.

The Spark SQL operator can be used as an input operator, where you can use parameters or constants for encryption and authentication.

Function Description

1. Spark SQL can be used as an input operator, where you can use encryption functions and parameters.

Reference a parameter in a format similar to select '${cyctime}' as date.

2. The Spark SQL operator supports standard SQL statements, and the usage is consistent with that of SQL in Configuration Method under Data Synchronization > Data Source. If you want to use statements such as UPDATE, use the SQL Script node.

3. For details about the common syntax and introduction of Spark SQL, see Spark SQL Syntax Overview.

4. The setting items on the Spark SQL setting page are described in the following table.

Tab Page	Setting Item	Introduction
Configuration	Input Source	The preceding operator of the Spark SQL operator serves as the input table by default, with its name being the input table name. The Spark SQL operator supports two types of preceding operators, including: Data input operators: such as DB Table Input, API Input, and Dataset Input Connection and transformation operators: such as Data Association, Row to Column, Column to Row, and JSON Parsing.
Configuration	SQL Statement	You need to enter the statement. The auto-completion feature is available when you enter the SQL statement. For example, when you type the table name followed by a period (.), the field names will be suggested. If there are many fields, you can scroll through field names (shown in the red box in the following figure) to select fields.
Data Preview	/	After finishing the configuration, you can click the Data Preview tab to preview data after being processed by Spark SQL.

5. On the Data Preview tab page, at most 5000 rows of data are sampled for the calculation to avoid server overload due to the excessive data volume. All the selected data participates in the calculation during actual execution.

Example

You want to associate data from the SOrder file and the SCustomer database table and filter data.

You can download the example data:

Creating a Task

Create a scheduled task and drag a Data Transformation node onto the design page, as shown in the following figure.

Configuring the Data Input

Click the Data Transformation node to enter the editing page. Upload the local Excel file to the FineDataLink system, as shown in the following figure. For details, see Function Description of File Input.

Add a DB Table Input operator and use the SQL statement to fetch all the data from the SCustomer table, as shown in the following figure.

Associating Data

Add a Data Association operator and set Join Method to Left Join to combine the SOrder table and the SCustomer table based on the Customer ID field, as shown in the following figure.

Configuring the Spark SQL Operator

Add a Spark SQL operator and connect it to the Data Association operator as the downstream operator. Enter the following statement in SQL Statement.

select * FROM Data Association where Data Association.Fax is not null and Data Association.Ship City = 'Beijing'

Note:

You cannot copy and paste the above statement. The input table name is the name of the preceding operator (which is Data Association in this example). You can specify the input table by clicking the upper button or entering the table name and pressing the Enter key.

Note the requirement of writing the SQL statement, as shown in the following figure.

Click Data Preview, as shown in the following figure.

Configuring the Field Setting Operator

Add a Field Setting operator, where you can delete and modify the field name and the data type, as shown in the following figure.

Configuring the DB Table Output Operator

Drag another DB Table Output operator to the design page and connect it to the Spark SQL operator as the downstream operator.

Click the DB Table Output operator to configure it, as shown in the following figure.

Running the Task

1. Click the Save button in the upper right corner. Click Run in the upper right corner, as shown in the following figure. After successful execution, a message indicating successful execution will be displayed in Log.

You can see that the out_order table has been added to the database, as shown in the following figure.

Previous：Field-to-Column Splitting

Next：Data Filtering

Helpful
Not helpful
Only read

中文（简体）

English

Spark SQL

Overview

Version

Application Scenario

Function Description

Function Description

Example

Creating a Task

Configuring the Data Input

Associating Data

Configuring the Spark SQL Operator

Configuring the Field Setting Operator

Configuring the DB Table Output Operator

Running the Task

附件列表