Overview
Version Description
FineDataLink Version | Functional Change |
---|---|
4.0.5 | / |
4.0.7 | Optimized the user interface of the pipeline task list. |
4.0.27 |
|
4.0.29 | Allowed copying and pasting tasks to a specified folder. |
4.1.2 |
|
4.1.5.3 |
|
4.1.6.3 | Prohibited multiple users from editing the same pipeline task simultaneously. |
4.1.6.5 |
|
1. Divided Sync Object into Group Table and Ordinary Table on the Real-Time Statistics and Historical Statistics tab pages. 2. Displayed the logs of the sub-tables of the group table on the Running Log tab page. |
Function Description
After configuring pipeline tasks on FineDataLink platform, you can monitor real-time data synchronization, view the task running status, as well as view and process dirty data quickly.
Task List
You can view and edit all pipeline tasks within your permission scope in the task list.
Pipeline Task Display Format/Location Description
1. Pipeline tasks can be placed in folders. Click a folder to see all pipeline tasks within it. You can adjust the display format of pipeline tasks (card style or list style).
2. You can move a pipeline task to a specified folder, as shown in the following figure.
Renaming/Moving/Copying and Pasting/Exporting/Deleting Pipeline Tasks
Click the button on the right side of a pipeline task to modify the name, move or copy and paste the task to a specified path, as well as delete or export the task, as shown in the following figure.
1. In FineDataLink 4.1.6.5 and later versions, you can drag and drop pipeline tasks/folders to change their positions.
All users can manually drag and drop pipeline tasks/folders on which they have permission.
The new order works on all users' pages.
For example, a user sees top-down pipeline tasks 5, 1, and 7, and an admin sees top-down pipeline tasks 5, 6, 1, 9, and 7. If the user places the pipeline task 7 between pipeline tasks 5 and 1, then the pipeline task 7 is shown between pipeline tasks 6 and 1 on the admin’s page, and the pipeline task 7 is placed near the lower target node. (In this scenario, pipeline tasks 5 and 1 are target nodes, and the pipeline task 1 is the lower one.) The same rule applies to folders.
2. In FineDataLink 4.1.6.5 and later versions, the Move to button is added.
You can move a task to another folder by clicking the Move to button. After being moved to another folder, the task will be located at the end of the new folder. Moving a task to its original folder is not allowed. The new task location works on all users' pages.
Pausing/Editing Pipeline Tasks
Click a pipeline task in the task list to see whether the task is running. You can also stop the task manually or enter the task editing page, as shown in the following figure.
Display Content of Task Running Status |
---|
Operation status |
Last start time of the task |
Source data read time Target data read time |
Source data type Target data type |
Starting/Pausing the task |
Editing the task |
For pipeline tasks in the running, paused, aborted, draft, or to be started status, you can click the Edit button to modify or view the configuration items. The content that can be modified is shown in the following table.

If a task is paused, you can add or remove group tables, as well as add or remove tables within existing group tables.
Function | Running | Paused/Aborted | Draft | To Be Started | |
---|---|---|---|---|---|
Source Selection (Configuration Page) | Data Source Type | View | View | View and Edit | View and Edit |
Data Source and Data Connection | View | View and Edit | View and Edit | ||
Data Source Permission Detection | Use | Use | Use | ||
Read Mode | View | View and Edit | View and Edit | ||
Synchronization Type | Switching between full and incremental synchronization is not allowed. You can modify Incremental Sync Start Point if you select Incremental Sync Only. | ||||
Sync Object | Adding or removing sync objects is allowed. 1. Adding sync objects: The added table will be synchronized according to the selected synchronization type.
2. Removing sync objects: If you remove a sync object and save the modification, all related information of this object will also be removed. The corresponding table will not be synchronized when the task is started. | ||||
Target Selection (Configuration Page) | Data Destination Type | View | |||
Data Destination and Data Connection | |||||
Pattern | |||||
Data Deletion Strategy | |||||
Mark Timestamp During Synchronization | |||||
Synchronize Source Table Structure Change (DDL) | View and Edit You can adjust the DDL synchronization status. | ||||
Table Field Mapping (Configuration Page) | Field Mapping |
You can only view field mappings of tables already in synchronization, but cannot edit or adjust them. | |||
Pipeline Control (Configuration Page) | Dirty Data Threshold | View and Edit | |||
Retry After Failure | View and Edit | ||||
Result Notification | View and Edit |
Statistics Log
Viewing Real-Time Synchronization Status from the Time Dimension
Click Activity Management to view the historical synchronization trends of the current task.

2. The Current Sync Phase column description: If all tables in the group are in incremental synchronization, the group table is in incremental synchronization. As long as there is a table in the group in full synchronization, the group table is in full synchronization
Real-Time Statistics
Click Real-Time Statistics to view the historical total reads of the task. Namely, the total amount of data read and output since the task first started, the amount of pending data, and the real-time rates of reading and outputting data.

If resynchronization is used, the reading and writing of a single table will be recalculated. The resynchronized data volume will not be accumulated to the total read and output volume of Real-Time Statistics, but will be included in Historical Statistics.
The details are as follows:
1. Total Read = Pending Data + Total Output + Amount of Dirty Data
You can view the delay time of writing all tasks or a single data table in the Pending Data module.
Write Delay Time = Message Read Time - Message Write Time

The amounts of dirty data, pending data, as well as data read and output of the group table are the sum of the corresponding indicators in each group:
For example, group table A contains sub-tables A1 and A2:
Dirty data amount of group table A: (from sub-table A1, the amount of dirty data when writing to table A fails) + (from sub-table A2, the amount of dirty data when writing to table A fails)
Pending data of group table A: Pending data from sub-table A1 + Pending data from sub-table A2
The amount of data read in group table A: The amount of data read (addition, deletion, and modification) from sub-table A1 + The amount of data read (addition, deletion, and modification) from sub-table A2
The amount of data output in group table A: The amount of data write (addition, deletion, and modification) from sub-table A1 + The amount of data write (addition, deletion, and modification) from sub-table A2
2. Time of Reading Message and Time of Writing Message can be viewed by hovering the cursor over the icon, as shown in the following figure.
Historical Statistics
In 4.1.2 and above versions, you can click Historical Statistics to select and view the real-time synchronization status of Last 2 Hours, Last 24 Hours, Last 3 Days, Last 7 Days, and Last 15 Days.
Total Read, Pending Data, Total Output, and the real-time rate of reading and outputting data (rows/s) are displayed, as shown in the following figure.
You can also customize the time interval to view the data synchronization status of the selected interval, as shown in the following figure.
Viewing Synchronization Status from the Data Table Dimension
You can view the synchronization status of all source tables in the Sync Object module, as shown in the following figure.
Function | Description | |
---|---|---|
1 | Source Data Table | Displays the name of the data table |
2 | Time of Reading and Writing the Data Table | Displayed on the Real-Time Statistics tab page Time of Reading Message Time of Writing Message |
3 | Sync Status | Displays the current synchronization status of the table: incremental synchronization or full synchronization |
4 | Dirty Data |
![]() If the primary key update of data from value A to value B fails, there will be two rows of dirty data: data with primary key A will be deleted, and data with primary key B will be updated or inserted. The data will be exported according to the structure of the target table as configured in the field mapping. |
5 | Pending Data | Displays the amount of pending data |
6 | Read and Write Statistics | Displays details about the number of inserted, updated, and deleted data Displays the read/write speed ![]() Read and Write Statistics can only be viewed on the Historical Statistics tab page. |
7 | View Log | Displays the task running log of a single table |
8 | Resync | Full resynchronization of single tables or the entire pipeline task is supported. Full synchronization will be re-executed after the target table is cleared, and incremental synchronization will be performed after full synchronization is completed. If the task is fully resynchronized, the log statistics will be reset (including input rows and output rows). ![]() If Logical Deletion is turned on, the target table will be cleared and rewritten during resynchronization. The rewriting uses the insert logic. A prompt will appear: "Data of logical deletion generated during task operation will be cleared." |
Click Dirty Data to filter the source table with dirty data. Click a specified data table to view the error details, which will be useful for subsequent processing. Error Type and Fault Cause are shown in the following table.
Error Type | Fault Cause | Fault Details |
---|---|---|
Unmatched Filed Type | The data type of the <Column_name> field is unmatched. | The expected data type of one or more <Column_name> field(s) does not match the actual data type, or is different from the data type received before. Update the field type of the target table. For example, a string value is found in a Boolean field, or a null value is found in a non-nullable field. |
Field Length Exceeded | The length of the data exceeds the length of the <Column_name> field. | The data size is larger than the size of the <Column_name> field. Update the field type of the target table. For example, a string with a length of 1000 is found in the VARCHAR(255) field. |
Missing Target Field | The <Column_name> field does not exist. Create related fields first. | One or more <Column_name> field(s) specified in the field mapping do not exist in the target table. Create related fields first. |
Missing Target Table | The <Table_name> target table is missing. | The <Table_name> target table specified in the field mapping does not exist. Create related tables first. |
Invalid Data Destination | Failed to connect the <Data_connection_name> data destination. | Failed to connect the <Data_connection_name> data destination. Check whether the network is connected, the account password is correct, and the account permission is assigned. |
Missing Write Permission | No write permission is assigned on the <Table_name> target table. | No write permission is assigned on the <Table_name> target table. Adjust related permission and try again. |
An error occurred during the full synchronization | An error occurred during the full synchronization | Error details |
Other Exception | Other | Stack details |
Processing Dirty Data

Instruction on details display and handling of dirty data in group tables: Processing group tables integrally is currently not allowed. The processing can only be performed on each table within the group.
Processing a Single Piece of Dirty Data
To process a single piece of dirty data, select the dirty data of a specified data table, and click Retry or Ignore, as shown in the following figure.
Processing Dirty Data in Batches
To process all dirty data in the current task or dirty data of a specified table in batches, tick a single table or multiple tables of the specified source database to skip, retry, or resynchronize the dirty data, as shown in the following figure.
Processing Method | Description |
---|---|
Skip Dirty Data | For a single table and specified multiple tables, if you click Skip Dirty Data, the cached dirty data will be deleted and cannot be retrieved. Meanwhile, these data rows will be removed from the dirty data rows in log statistics. |
Retry Dirty Data | For a single table and specified multiple tables, if you click Retry Dirty Data, the cached dirty data will be resubmitted and the data volume statistics will be updated. ![]() Retry Dirty Data is not supported for dirty data generated in the full volume synchronization phase. |
Resync | Full resynchronization of single tables or the entire pipeline task is supported. Full synchronization will be re-executed after the target table is cleared, and incremental synchronization will be performed after full synchronization is completed. If the task is fully resynchronized, the log statistics will be reset (including input rows and output rows). ![]() 2. Retry Dirty Data is not supported for dirty data generated in the full volume synchronization phase. |

1. During the full volume synchronization phase, retrying, skipping, or resynchronizing dirty data of the pipeline task is not supported.
To process the dirty data generated in the full volume synchronization phase, you can export the details and manually adjust the existing records or insert new records, as shown in the following figure.

2. If the data source is Kafka, resynchronization is not supported. To resynchronize expired dirty data, you can only rerun the entire table or manually insert dirty data and ignore it in the pipeline.
3. If the task is paused or aborted, the logic will be executed during the next startup after resynchronization.
Recording logic of event-based dirty data:
The table with primary key A generates dirty data at time point t1. If the writing operations to primary key A after t1 are successful, the historical dirty data of primary key A will be cleared. If the writing fails, only the latest dirty data of primary key A will be retained.
At time point t1, if the primary key A is updated to primary key B, the pipeline task should be disassembled into two events and processed sequentially to the target end: deleting primary key A and inserting primary key B.
Note:
Constraints of batch loading mode at the output end
If the output end is in the batch loading mode, a large batch containing multiple pieces of data is generally used to submit data to the target end at one time. Such data sources may not be able to identify which data is dirty during the process. The specific analysis of the existing target ends that support batch loading is as follows:
1. GaussDB 200 supports two writing modes, copy and parallel loading:
The copy mode:
In the full synchronization phase, if a single batch fails to be submitted, the data of this batch will be changed to call the JDBC API for writing (A single batch includes 1024 records currently.). If JDBC is used for writing, specific error information can be obtained, and the details of dirty data can be displayed, but the performance is poor.
In the incremental synchronization phase, if a single batch fails to be submitted, the data of the entire batch will be recorded as dirty data (The maximum size of a single batch is 5 MB currently, which is approximately within 10,000 rows. The specific number of rows depends on the size of each piece of data).
Parallel loading:
In parallel loading, all the required data in a single task is submitted at one time. Parallel loading provides you with an API to query specific error information. FineDataLink can be used to query details of dirty data.
2. HDFS writing method of HIVE:
The data is directly written to the HDFS file at one time. Detailed error information cannot be obtained. Dirty data management is not supported.
Running Log
Visualized Log
1. Click Running Log to see the historical running logs of the current pipeline task, including Time, Level, Classification, and Description, as shown in the following figure.
2. Logs can also be searched for or filtered, as shown in the following figure.

You can filter logs based on tasks, tables, types, and generation time.
Click the Filter button. The drop-down list of Log Classification is shown in the following figure.
3. If there is no running log currently, the tab page will be displayed as below.
Log Level
Logs are divided into four levels: BASIC INFO, INFO, WARN, and ERROR.
BASIC INFO is the basic level.
Click Details next to each log to view the specific description, as shown in the following figure.