Single Pipeline Task O&M- FineDataLink Help Document

Last update: October 28, 2024

Overview

Version Description

FineDataLink Version	Functional Change
4.0.5	/
4.0.7	Optimized the user interface of the pipeline task list.
4.0.27	Optimized the prompts of pipeline tasks. Allowed selecting data tables to be synchronized in batches quickly. Allowed managing pipeline tasks based on folders. Allowed skipping, retrying, and resynchronizing dirty data in single tables and multiple tables. Optimized the prompts of task logs.
4.0.29	Allowed copying and pasting tasks to a specified folder.
4.1.2	Allowed customizing historical statistics to view the synchronization status. For details, see the section "Viewing Real-Time Synchronization Status from the Time Dimension." Allowed viewing the configuration status of running tasks. For details, see the chapter "Task List." Displayed the delay time of writing the pending data. Allowed viewing Time of Reading Message and Time of Writing Message. For details, see the section "Real-Time Statistics." Optimized the display methods of dirty data. For details, see the section "Viewing Synchronization Status from the Data Table Dimension." Optimized the processing methods of dirty data. For details, see the section "Processing Dirty Data."
4.1.5.3	Prompted the user on the page if there is no running log currently. Optimized the filtering button of Running Log. Marked the Level column of Running Log with different colors. Optimized the Classification column.
4.1.6.3	Prohibited multiple users from editing the same pipeline task simultaneously.
4.1.6.5	Allowed reordering pipeline tasks/folders with drag-and-drop operations. Added the Move to option to the button next to each task.
4.1.8.1	1. Divided Sync Object into Group Table and Ordinary Table on the Real-Time Statistics and Historical Statistics tab pages. 2. Displayed the logs of the sub-tables of the group table on the Running Log tab page.

Function Description

After configuring pipeline tasks on FineDataLink platform, you can monitor real-time data synchronization, view the task running status, as well as view and process dirty data quickly.

Task List

You can view and edit all pipeline tasks within your permission scope in the task list.

Pipeline Task Display Format/Location Description

1. Pipeline tasks can be placed in folders. Click a folder to see all pipeline tasks within it. You can adjust the display format of pipeline tasks (card style or list style).

2. You can move a pipeline task to a specified folder, as shown in the following figure.

Renaming/Moving/Copying and Pasting/Exporting/Deleting Pipeline Tasks

Click the button on the right side of a pipeline task to modify the name, move or copy and paste the task to a specified path, as well as delete or export the task, as shown in the following figure.

1. In FineDataLink 4.1.6.5 and later versions, you can drag and drop pipeline tasks/folders to change their positions.

All users can manually drag and drop pipeline tasks/folders on which they have permission.
The new order works on all users' pages.
For example, a user sees top-down pipeline tasks 5, 1, and 7, and an admin sees top-down pipeline tasks 5, 6, 1, 9, and 7. If the user places the pipeline task 7 between pipeline tasks 5 and 1, then the pipeline task 7 is shown between pipeline tasks 6 and 1 on the admin’s page, and the pipeline task 7 is placed near the lower target node. (In this scenario, pipeline tasks 5 and 1 are target nodes, and the pipeline task 1 is the lower one.) The same rule applies to folders.

2. In FineDataLink 4.1.6.5 and later versions, the Move to button is added.

You can move a task to another folder by clicking the Move to button. After being moved to another folder, the task will be located at the end of the new folder. Moving a task to its original folder is not allowed. The new task location works on all users' pages.

Pausing/Editing Pipeline Tasks

Click a pipeline task in the task list to see whether the task is running. You can also stop the task manually or enter the task editing page, as shown in the following figure.

Display Content of Task Running Status
Operation status
Last start time of the task
Source data read time Target data read time
Source data type Target data type
Starting/Pausing the task
Editing the task

For pipeline tasks in the running, paused, aborted, draft, or to be started status, you can click the Edit button to modify or view the configuration items. The content that can be modified is shown in the following table.

Note:

If a task is paused, you can add or remove group tables, as well as add or remove tables within existing group tables.

Function		Running	Paused/Aborted	Draft	To Be Started
Source Selection (Configuration Page)	Data Source Type	View	View	View and Edit	View and Edit
	Data Source and Data Connection		View	View and Edit	View and Edit
	Data Source Permission Detection		Use	Use	Use
	Read Mode		View	View and Edit	View and Edit
	Synchronization Type		Switching between full and incremental synchronization is not allowed. You can modify Incremental Sync Start Point if you select Incremental Sync Only.
	Sync Object		Adding or removing sync objects is allowed. 1. Adding sync objects: The added table will be synchronized according to the selected synchronization type. For Full + Incremental Synchronization, the added tables require full synchronization. Incremental synchronization will be suspended in the background until the full sync is finished. For Incremental Sync Only: If Incremental Sync Start Point is modified, all tables (including the added tables) will be synchronized according to the specified start point. If Incremental Sync Start Point is not modified, the added table will be synchronized according to the built-in breakpoint of the task. 2. Removing sync objects: If you remove a sync object and save the modification, all related information of this object will also be removed. The corresponding table will not be synchronized when the task is started.
Target Selection (Configuration Page)	Data Destination Type		View
	Data Destination and Data Connection
	Pattern
	Data Deletion Strategy
	Mark Timestamp During Synchronization
	Synchronize Source Table Structure Change (DDL)		View and Edit You can adjust the DDL synchronization status.
Table Field Mapping (Configuration Page)	Field Mapping		The added tables can be configured with field mappings. You can only view field mappings of tables already in synchronization, but cannot edit or adjust them.
Pipeline Control (Configuration Page)	Dirty Data Threshold		View and Edit
	Retry After Failure		View and Edit
	Result Notification		View and Edit

Statistics Log

Viewing Real-Time Synchronization Status from the Time Dimension

Click Activity Management to view the historical synchronization trends of the current task.

Note:

1. On Real-Time Statistics and Historical Statistics tab pages, Sync Object is divided into Group Table and Ordinary Table. Group tables can be expanded to show the sub-tables of the group.

2. The Current Sync Phase column description: If all tables in the group are in incremental synchronization, the group table is in incremental synchronization. As long as there is a table in the group in full synchronization, the group table is in full synchronization

Real-Time Statistics

Click Real-Time Statistics to view the historical total reads of the task. Namely, the total amount of data read and output since the task first started, the amount of pending data, and the real-time rates of reading and outputting data.

Note:

If resynchronization is used, the reading and writing of a single table will be recalculated. The resynchronized data volume will not be accumulated to the total read and output volume of Real-Time Statistics, but will be included in Historical Statistics.

The details are as follows:

1. Total Read = Pending Data + Total Output + Amount of Dirty Data

You can view the delay time of writing all tasks or a single data table in the Pending Data module.

Write Delay Time = Message Read Time - Message Write Time

Note:

The amounts of dirty data, pending data, as well as data read and output of the group table are the sum of the corresponding indicators in each group:

For example, group table A contains sub-tables A1 and A2:

Dirty data amount of group table A: (from sub-table A1, the amount of dirty data when writing to table A fails) + (from sub-table A2, the amount of dirty data when writing to table A fails)

Pending data of group table A: Pending data from sub-table A1 + Pending data from sub-table A2

The amount of data read in group table A: The amount of data read (addition, deletion, and modification) from sub-table A1 + The amount of data read (addition, deletion, and modification) from sub-table A2

The amount of data output in group table A: The amount of data write (addition, deletion, and modification) from sub-table A1 + The amount of data write (addition, deletion, and modification) from sub-table A2

2. Time of Reading Message and Time of Writing Message can be viewed by hovering the cursor over the icon, as shown in the following figure.

Historical Statistics

In 4.1.2 and above versions, you can click Historical Statistics to select and view the real-time synchronization status of Last 2 Hours, Last 24 Hours, Last 3 Days, Last 7 Days, and Last 15 Days.

Total Read, Pending Data, Total Output, and the real-time rate of reading and outputting data (rows/s) are displayed, as shown in the following figure.

You can also customize the time interval to view the data synchronization status of the selected interval, as shown in the following figure.

Viewing Synchronization Status from the Data Table Dimension

You can view the synchronization status of all source tables in the Sync Object module, as shown in the following figure.

	Function	Description
1	Source Data Table	Displays the name of the data table
2	Time of Reading and Writing the Data Table	Displayed on the Real-Time Statistics tab page Time of Reading Message Time of Writing Message
3	Sync Status	Displays the current synchronization status of the table: incremental synchronization or full synchronization
4	Dirty Data	Displays the number of dirty data Click X Row(s) of Dirty Data to view the details of dirty data according to the error type, time, and keywords. Click Export to export dirty data. Note: If the primary key update of data from value A to value B fails, there will be two rows of dirty data: data with primary key A will be deleted, and data with primary key B will be updated or inserted. The data will be exported according to the structure of the target table as configured in the field mapping.
5	Pending Data	Displays the amount of pending data
6	Read and Write Statistics	Displays details about the number of inserted, updated, and deleted data Displays the read/write speed Note: Read and Write Statistics can only be viewed on the Historical Statistics tab page.
7	View Log	Displays the task running log of a single table
8	Resync	Full resynchronization of single tables or the entire pipeline task is supported. Full synchronization will be re-executed after the target table is cleared, and incremental synchronization will be performed after full synchronization is completed. If the task is fully resynchronized, the log statistics will be reset (including input rows and output rows). Note: If Logical Deletion is turned on, the target table will be cleared and rewritten during resynchronization. The rewriting uses the insert logic. A prompt will appear: "Data of logical deletion generated during task operation will be cleared."

Click Dirty Data to filter the source table with dirty data. Click a specified data table to view the error details, which will be useful for subsequent processing. Error Type and Fault Cause are shown in the following table.

Error Type	Fault Cause	Fault Details
Unmatched Filed Type	The data type of the <*Column_name*> field is unmatched.	The expected data type of one or more <*Column_name*> field(s) does not match the actual data type, or is different from the data type received before. Update the field type of the target table. For example, a string value is found in a Boolean field, or a null value is found in a non-nullable field.
Field Length Exceeded	The length of the data exceeds the length of the <*Column_name*> field.	The data size is larger than the size of the <*Column_name> field. Update the field type of the target table. For example, a string with a length of 1000 is found in the VARCHAR(255)* field.
Missing Target Field	The <*Column_name*> field does not exist. Create related fields first.	One or more <*Column_name*> field(s) specified in the field mapping do not exist in the target table. Create related fields first.
Missing Target Table	The <*Table_name*> target table is missing.	The <*Table_name*> target table specified in the field mapping does not exist. Create related tables first.
Invalid Data Destination	Failed to connect the <*Data_connection_name*> data destination.	Failed to connect the <*Data_connection_name*> data destination. Check whether the network is connected, the account password is correct, and the account permission is assigned.
Missing Write Permission	No write permission is assigned on the <*Table_name*> target table.	No write permission is assigned on the <*Table_name*> target table. Adjust related permission and try again.
An error occurred during the full synchronization	An error occurred during the full synchronization	Error details
Other Exception	Other	Stack details

Processing Dirty Data

Note:

Instruction on details display and handling of dirty data in group tables: Processing group tables integrally is currently not allowed. The processing can only be performed on each table within the group.

Processing a Single Piece of Dirty Data

To process a single piece of dirty data, select the dirty data of a specified data table, and click Retry or Ignore, as shown in the following figure.

Processing Dirty Data in Batches

To process all dirty data in the current task or dirty data of a specified table in batches, tick a single table or multiple tables of the specified source database to skip, retry, or resynchronize the dirty data, as shown in the following figure.

Processing Method	Description
Skip Dirty Data	For a single table and specified multiple tables, if you click Skip Dirty Data, the cached dirty data will be deleted and cannot be retrieved. Meanwhile, these data rows will be removed from the dirty data rows in log statistics.
Retry Dirty Data	For a single table and specified multiple tables, if you click Retry Dirty Data, the cached dirty data will be resubmitted and the data volume statistics will be updated. Note: Retry Dirty Data is not supported for dirty data generated in the full volume synchronization phase.
Resync	Full resynchronization of single tables or the entire pipeline task is supported. Full synchronization will be re-executed after the target table is cleared, and incremental synchronization will be performed after full synchronization is completed. If the task is fully resynchronized, the log statistics will be reset (including input rows and output rows). Note: 1. If Logical Deletion is turned on, the target table will be cleared and rewritten during resynchronization. The rewriting uses the insert logic. A prompt will appear: "Data of logical deletion generated during task operation will be cleared." 2. Retry Dirty Data is not supported for dirty data generated in the full volume synchronization phase.

Processing Method

Description

Skip Dirty Data

For a single table and specified multiple tables, if you click Skip Dirty Data, the cached dirty data will be deleted and cannot be retrieved.

Meanwhile, these data rows will be removed from the dirty data rows in log statistics.

Retry Dirty Data

For a single table and specified multiple tables, if you click Retry Dirty Data, the cached dirty data will be resubmitted and the data volume statistics will be updated.

Note:

Retry Dirty Data is not supported for dirty data generated in the full volume synchronization phase.

Resync

Full resynchronization of single tables or the entire pipeline task is supported. Full synchronization will be re-executed after the target table is cleared, and incremental synchronization will be performed after full synchronization is completed.

If the task is fully resynchronized, the log statistics will be reset (including input rows and output rows).

Note:

1. If Logical Deletion is turned on, the target table will be cleared and rewritten during resynchronization. The rewriting uses the insert logic. A prompt will appear: "Data of logical deletion generated during task operation will be cleared."

2. Retry Dirty Data is not supported for dirty data generated in the full volume synchronization phase.

Note:

1. During the full volume synchronization phase, retrying, skipping, or resynchronizing dirty data of the pipeline task is not supported.

To process the dirty data generated in the full volume synchronization phase, you can export the details and manually adjust the existing records or insert new records, as shown in the following figure.

Note:

2. If the data source is Kafka, resynchronization is not supported. To resynchronize expired dirty data, you can only rerun the entire table or manually insert dirty data and ignore it in the pipeline.

3. If the task is paused or aborted, the logic will be executed during the next startup after resynchronization.

Recording logic of event-based dirty data:

The table with primary key A generates dirty data at time point t1. If the writing operations to primary key A after t1 are successful, the historical dirty data of primary key A will be cleared. If the writing fails, only the latest dirty data of primary key A will be retained.
At time point t1, if the primary key A is updated to primary key B, the pipeline task should be disassembled into two events and processed sequentially to the target end: deleting primary key A and inserting primary key B.

Note:

Constraints of batch loading mode at the output end

If the output end is in the batch loading mode, a large batch containing multiple pieces of data is generally used to submit data to the target end at one time. Such data sources may not be able to identify which data is dirty during the process. The specific analysis of the existing target ends that support batch loading is as follows:

1. GaussDB 200 supports two writing modes, copy and parallel loading:

The copy mode:

In the full synchronization phase, if a single batch fails to be submitted, the data of this batch will be changed to call the JDBC API for writing (A single batch includes 1024 records currently.). If JDBC is used for writing, specific error information can be obtained, and the details of dirty data can be displayed, but the performance is poor.
In the incremental synchronization phase, if a single batch fails to be submitted, the data of the entire batch will be recorded as dirty data (The maximum size of a single batch is 5 MB currently, which is approximately within 10,000 rows. The specific number of rows depends on the size of each piece of data).

Parallel loading:

In parallel loading, all the required data in a single task is submitted at one time. Parallel loading provides you with an API to query specific error information. FineDataLink can be used to query details of dirty data.

2. HDFS writing method of HIVE:

The data is directly written to the HDFS file at one time. Detailed error information cannot be obtained. Dirty data management is not supported.

Running Log

Visualized Log

1. Click Running Log to see the historical running logs of the current pipeline task, including Time, Level, Classification, and Description, as shown in the following figure.

2. Logs can also be searched for or filtered, as shown in the following figure.

Note:

You can filter logs based on tasks, tables, types, and generation time.

Click the Filter button. The drop-down list of Log Classification is shown in the following figure.

3. If there is no running log currently, the tab page will be displayed as below.

Log Level

Logs are divided into four levels: BASIC INFO, INFO, WARN, and ERROR.

BASIC INFO is the basic level.

Click Details next to each log to view the specific description, as shown in the following figure.

Previous：Batch Pipeline Task O&M

Next：Data Service O&M

Helpful
Not helpful
Only read

中文（简体）

English

Single Pipeline Task O&M

Overview

Version Description

Function Description

Task List

Pipeline Task Display Format/Location Description

Renaming/Moving/Copying and Pasting/Exporting/Deleting Pipeline Tasks

Pausing/Editing Pipeline Tasks

Statistics Log

Viewing Real-Time Synchronization Status from the Time Dimension

Viewing Synchronization Status from the Data Table Dimension

Processing Dirty Data

Running Log

Visualized Log

Log Level

附件列表