This article explains the unique terms of FineDataLink to help you use the product.
FDL provides function modules including Data Development, Data Pipeline, Data Service, Task O&M, and others to meet a series of needs such as data synchronization, processing, and cleaning.
Data Development
Develop and arrange tasks through SQL statements or visual operations.
Synchronize data in real time with high performance in the case of large data volume or standard table structure.
Encapsulate the standardized data after processing as standard APIs and release them for external systems to call.
Usually used with Task O&M, Data Development can define the development and scheduling attributes of periodic scheduled tasks. It provides a visual development page, helping you easily build offline data warehouses and ensuring efficient and stable data production.
Data Development provides various nodes for you to choose from according to business needs, many of which support periodic task scheduling
It is used to store data development tasks and a folder can store multiple scheduled tasks.
Scheduled Task defines the operations performed on data. For example:
Synchronize data from MySQL to Oracle through a task using the Data Synchronization node.
Parse API data and store them in a specified database through Data Transformation, Loop Container, and other nodes.
A scheduled task can consist of a single data node or process node, or be a combination of multiple nodes.
A node is a basic unit of a scheduled task. You can determine the execution order of nodes by connecting them with lines or curves, thus forming a complete scheduled task. Nodes in a task run in turn according to the dependencies.
ETL processing can be carried out in the Data Transformation node. With encapsulated visual functions, you can achieve data cleaning and loading efficiently.
It refers to the data flow between the input widgets and the output widgets, focusing on the processing of each row of records and each column of data. There are various operators for completing data input, output, transformation, and other operations.
A data flow (a data transformation node) only provides the following three types of operators, and does not contain combinatorial and process-type operators:
An example of output operators: DB Table Output.
An example of processing operators: Data Association.
An example of input operators: DB Table Input.
Terms involved in the Data Transformation node are explained in the following table.
Classification
Function Point
Type
Positioning
Function Description
Initial Version
Obtain data from specified Jiandaoyun forms.
Delete the data rows that exist in the target table but have been deleted in the input source by comparing field values. It includes:
Physical Deletion: It deletes data.
Logical Deletion: It does not delete data, and only adds deletion identifiers.
3.2
Note: This operator has been deleted since Version 4.0.18.
Advanced Data Source Control - NoSQL
Join
Data Association
Advanced
Support cross-database and cross-source connections.
The join methods include:
Left Join
Right Join
Inner Join
Full Outer Join
These join methods are consistent with how database tables are joined. You can get the join results by defining the association fields and conditions. It requires more than two inputs and only one output.
Data Comparison
Select the original table and the target table.
Configure the logical primary key.
Configure the comparison field.
Set the identifier.
Union All
Merge multiple tables by rows, and output a combined table.
Transformation
Field Setting
For adjusting the field name and type.
Set Column: Select and delete fields.
Modify Column: Modify the field name and type.
Column to Row
For changing the row and column structure of the data table to meet the needs of conversion between one-dimensional tables and two-dimensional tables.
Convert columns in the input data table to rows.
Column to Row (also known as unpivoting): Convert one-row multi-column data into multi-row one-column data. Usually, the converted column is named by the value of a cell to identify the original data.
Row to Column
For converting the rows in the data table to columns.
Convert rows in the input data table to columns.
Row to Column (also known as pivoting): Convert multi-row one-column data into one-row multi-column data. Usually, the converted column is named by the classified value of a column, and multiple rows of data corresponding to this value are displayed on a row.
JSON Parsing
For parsing JSON data and output data in the format of rows and columns.
Obtain JSON data output by the upstream node, parse them into row-and-column format data, and output parsed data to the downstream node.
XML Parsing
For specifying the parsing strategy and parsing the input XML data into row-and-column format data.
Specify the parsing strategy and parse the input XML data into row-and-column format data.
JSON Generation
For selecting fields to generate JSON objects.
Select fields and convert table data into multiple JSON objects, which can be nested.
New Calculation Column
For generating new columns through calculation.
Support formula calculation or logical mapping of constants, parameters, and other fields, and place the results into a new column for subsequent operations or output.
Data Filtering
For filtering eligible data records.
Filter eligible data records.
Group Summary
It refers to merging the same data into one group based on conditions and summarizing and calculating data based on the grouped data.
For splitting field values according to specific rules (delimiters), where the split values form a new column.
Field-to-Column Splitting
Laboratory
Spark SQL
For improving scenario coverage by providing flexible structured data processing.
With the built-in Spark calculation engine and Spark SQL operators, you can obtain the data output by the upstream node, query and process them using Spark SQL, and output them to the downstream node.
3.6
It can be used as an input operator in V4.0.17.
For running Python scripts to process complex data.
A step flow is composed of nodes.
Functional Boundary
A step flow, also called workflow, is the arrangement of steps. Each step is relatively independent and runs in the specified order without the flow of data rows.
Each step is a closed loop from input to output.
The terms involved in a step flow are explained in the following table.
General
Data Synchronization
For completing data synchronization (between input and output) quickly. Data transformation is not supported during data synchronization.
Support multiple methods to fetch data, such as fetching data by API, SQL statement, and file. Memory calculation is not required as there is no data processing during the process.
It applies to the following scenarios where:
Rapid synchronization of data tables is needed.
The calculation is finished during data retrieval, and no calculation or transformation is needed in the process.
The target database has strong computing ability or the data volume is large. Synchronize data to the target database and then use SQL statements for further development.
Data Transformation
For transforming and processing data between the input and output nodes.
Note: The Data Transformation node can be a node in a step flow.
Complex data processing such as data association, transformation, and cleaning between table input and output nodes is supported based on data synchronization.
Data Transformation is a data flow in essence but relies on the memory computing engine since it involves data processing. It is suitable for data development with small data volumes (ten million records and below), and the computing performance is related to memory configuration.
Script
SQL Script
For issuing SQL statements to the specified relational database and execution.
Write SQL statements to perform operations on tables and data such as creation, update, deletion, read, association, and summary.
Shell Script
For running the executable Shell script on a remote machine through the remote SSH connection.
Run the executable Shell script on a remote machine through a configured SSH connection.
Python Script
For running Python scripts as an extension of data development tasks. For example, you can use Python programs to independently process data in files that cannot be read currently.
Bat Script
For running the executable Batch script on a remote machine through the remote connection.
Process
Conditional Branch
For conducting conditional judgment in a step flow.
Task Call
For calling other tasks to complete the cross-task arrangement.
Virtual Node
Null operation.
A virtual node is a null operation and functions as a middle point to connect multiple upstream nodes and multiple downstream nodes. It can be used for process design.
Loop Container
For the loop execution of multiple nodes.
Parameter Assignment
Output the read data as the parameter for use by downstream nodes.
Notification
Notifications can be sent through email, SMS, platform, enterprise WeChat (group robot notification and App notification), and DingTalk.
You can customize the notification content.
V3.6
V4.0.1
V4.0.3
Connection
Execution Judgment on Connection
Right-click a connection line in a step flow and configure the execution condition, with options including Execute Unconditionally, Execute on Success, and Execute on Failure.
Right-click a node in a process flow, and click Execution Judgment. You can customize the method for multiple execution conditions to take effect (All and Anyone) to control the dependencies of nodes in the task flexibly.
Others
Remark
You can customize the content and format.
Field Mapping allows you to view and modify the relationship of fields in the source table and the target table to set the data write rules for the target table.
Concurrency refers to the number of scheduled tasks and pipeline tasks running simultaneously in FineDataLink.
To ensure the high performance of concurrent transmission, the number of CPU threads can be slightly greater than twice the number of concurrent tasks.
Each time a scheduled task runs, an instance will be generated, which can be viewed in Running Record.
When a task is running, the log will record the time when the instance starts to be generated.
If you have set the execution frequency for a scheduled task, the instance generation may start slightly later than the set time. For example, if the task is set to run at 11:00:00 every day, the instance generation may start at 11:00:02.
Data Pipeline provides a real-time data synchronization function that allows you to perform table-level and database-level synchronization. Data changes in some or all tables in the source database can be synchronized to the target database in real time to keep data consistent.
In the process of real-time synchronization, data from the source database are temporarily stored in Data Pipeline for efficiently writing data to the target database.
Therefore, it is necessary to configure a middleware for temporary data storage before setting up a data pipeline task. FineDataLink uses Kafka as a middleware for data synchronization to temporarily store and transmit data.
Definition of Dirty Data in Scheduled Task
1. Data that cannot be written into the target database due to a mismatch between fields (such as length/type mismatch, missing field, and violation of non-empty rule).
2. Data with conflict primary key in case the Strategy for Primary Key Conflict in Write Method is set to "Record as Dirty Data If Same Primary Key Exists".
Definition of Dirty Data in Pipeline Task
The dirty data in pipeline tasks refer to those that cannot be written into the target database due to a mismatch between fields (such as length/type mismatch, missing field, and violation of non-empty rule).
Note: Primary key conflict in pipeline tasks does not produce dirty data as the new data will overwrite the old ones.
Data Service enables the one-click release of processed data as APIs, facilitating cross-domain data transmission and sharing.
APPCode is a unique API authentication method of FineDataLink and can be regarded as a long-term valid Token. If set, it will take effect on a specified application API. To access the API, you need to specify the APPCode value in the Authorization request header in the format of APPCode + space + AppCode Value.
For details, see Binding API to Application.
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy