Real-Time Pipeline Task Instruction- FineDataLink Help Document

Last update: August 13, 2025

Overview

This document introduces how to use the Data Pipeline function.

Preparation

Preparation	Description
Step one (required): Prepare the FineDataLink project.	For details about deploying the FineDataLink project, see FineDataLink Deployment Method Selection.
Step two (required): Register the corresponding function point.	To use the Data Pipeline function, you must register the corresponding function point. For details, see Registration Introduction.
Step three (required): Prepare the data source.	For details about the data sources supported by pipeline tasks, see Types of Data Sources Supported by Data Pipeline. You need to have Use permission on data connections. For details about creating data connections, see Data Source Creation and Management. Or you can contact the admin to assign Use permission on data connections to you. For details, see Overview of Data Connection Permission.

Description

Step one (required): Prepare the FineDataLink project.

For details about deploying the FineDataLink project, see FineDataLink Deployment Method Selection.

Step two (required): Register the corresponding function point.

To use the Data Pipeline function, you must register the corresponding function point. For details, see Registration Introduction.

Step three (required): Prepare the data source.

For details about the data sources supported by pipeline tasks, see Types of Data Sources Supported by Data Pipeline.

You need to have Use permission on data connections. For details about creating data connections, see Data Source Creation and Management. Or you can contact the admin to assign Use permission on data connections to you. For details, see Overview of Data Connection Permission.

Note:

In FineDataLink of versions before 4.2.1.1, if you want to use a SQL Server database as the source database of a pipeline task, you are not advised to use a custom driver when configuring corresponding data connection, or set the log level of the pipeline task to DEBUG (as it will cause excessive SQL Server CDC logging in the pipeline task). You can set the log level to INFO.

Configuration Procedure

Note:

1. For details about a complete example of configuring a pipeline task, see Pipeline Task Example.

2. For MySQL, SQL Server, or Oracle databases with multiple tables needing real-time synchronization, you are advised to use a single pipeline task to synchronize all these tables in the same database to avoid overloading the database.

Step
Step One: Database Environment Preparation (Required) Grant the account configured in the data connection used in the pipeline task the necessary permission to perform the required operations on the database. For details, see Overview of Database Environment Preparation.
Step Two: Pipeline Task Environment Preparation (Required) Deploy Kafka (an open-source event streaming platform) as the middleware. For details, see Kafka Deployment - ZooKeeper Mode and Transmission Queue Configuration. (Only the super admin of the FineDataLink project can configure the transmission queue.) Note: 1. You are advised to deploy Kafka on a Linux system. (While Kafka can also be deployed on a Windows system, its performance will be limited. This deployment method is only suitable for demonstration purposes and not recommended in production environments.) Additionally, you can deploy Kafka and FineDataLink on different servers. 2. Before restarting Kafka, you need to manually pause the pipeline tasks. After restarting Kafka, you need to manually restart the pipeline tasks. Otherwise, exceptions will occur in pipeline tasks. 3. Kafka clusters are not supported by FineDataLink currently. 4. The Data Pipeline function requires Kafka as middleware to achieve real-time synchronization, which generates one topic for each source table. For example, if there are 1000 source tables, Kafka will generate 1000 topics.
Step Three: Pipeline Task Permission Assignment (Optional) Grant permission on Data Pipeline to users who are not super admins. Grant users who need to create pipeline tasks in a specific folder the Management permission on that folder. For details, see Pipeline Task Management Permission.
Step Four: Pipeline Task Configuration (Required) For details, see the following documents in order. 1. Pipeline Task Configuration - Source Selection Note: 1. You are not advised to use the Data Pipeline function to synchronize fields of longtext type, as this may cause issues with Kafka and affect execution efficiency. 2. Ensure the names of source table fields contain no spaces. Otherwise, an error will occur during task startup. 2. Pipeline Task Configuration - Target Selection 3. Pipeline Task Configuration - Table Field Mapping 4. Pipeline Task Configuration - Pipeline Control When it comes to the setting of Table Dirty Data Threshold, you should note that: In FineDataLink of versions before V4.2.1.1, the configured dirty data threshold applies to the entire task. However, in FineDataLink of V4.2.1.1 and later versions, this setting item is adjusted to Table Dirty Data Threshold, causing original task-level thresholds to be large for individual tables. Therefore, after upgrading FineDataLink to V4.2.1.1 or later versions, you are advised to appropriately reduce the existing threshold to maintain effective dirty data control.

Step

Step One: Database Environment Preparation (Required)

Grant the account configured in the data connection used in the pipeline task the necessary permission to perform the required operations on the database. For details, see Overview of Database Environment Preparation.

Step Two: Pipeline Task Environment Preparation (Required)

Deploy Kafka (an open-source event streaming platform) as the middleware. For details, see Kafka Deployment - ZooKeeper Mode and Transmission Queue Configuration. (Only the super admin of the FineDataLink project can configure the transmission queue.)

Note:

1. You are advised to deploy Kafka on a Linux system. (While Kafka can also be deployed on a Windows system, its performance will be limited. This deployment method is only suitable for demonstration purposes and not recommended in production environments.) Additionally, you can deploy Kafka and FineDataLink on different servers.

2. Before restarting Kafka, you need to manually pause the pipeline tasks. After restarting Kafka, you need to manually restart the pipeline tasks. Otherwise, exceptions will occur in pipeline tasks.

3. Kafka clusters are not supported by FineDataLink currently.

4. The Data Pipeline function requires Kafka as middleware to achieve real-time synchronization, which generates one topic for each source table. For example, if there are 1000 source tables, Kafka will generate 1000 topics.

Step Three: Pipeline Task Permission Assignment (Optional)

Grant permission on Data Pipeline to users who are not super admins.

Grant users who need to create pipeline tasks in a specific folder the Management permission on that folder.

For details, see Pipeline Task Management Permission.

Step Four: Pipeline Task Configuration (Required)

For details, see the following documents in order.

1. Pipeline Task Configuration - Source Selection

Note:

1. You are not advised to use the Data Pipeline function to synchronize fields of longtext type, as this may cause issues with Kafka and affect execution efficiency.

2. Ensure the names of source table fields contain no spaces. Otherwise, an error will occur during task startup.

2. Pipeline Task Configuration - Target Selection

3. Pipeline Task Configuration - Table Field Mapping

4. Pipeline Task Configuration - Pipeline Control

When it comes to the setting of Table Dirty Data Threshold, you should note that:

In FineDataLink of versions before V4.2.1.1, the configured dirty data threshold applies to the entire task. However, in FineDataLink of V4.2.1.1 and later versions, this setting item is adjusted to Table Dirty Data Threshold, causing original task-level thresholds to be large for individual tables. Therefore, after upgrading FineDataLink to V4.2.1.1 or later versions, you are advised to appropriately reduce the existing threshold to maintain effective dirty data control.

Description of Synchronization Logic in Different Scenarios

When you create or copy a pipeline task:

1. In the Source Selection step:

If you set Synchronization Type to Full + Incremental Synchronization, the task will synchronize all inventory data first, and then continuously synchronize the changes.
When the task runs for the first time, it performs full synchronization followed by incremental synchronization. If the task is interrupted or paused and then restarted, it will resume from the breakpoint of incremental synchronization and continue incremental synchronization (provided that the full synchronization for all tables has been completed).
If you set Synchronization Type to Incremental Sync Only, see Pipeline Task Configuration - Source Selection for instructions.

2. If you select an existing table as the target table and the structure (referring to the table name and the field name) of the target table is the same as that of the source table, the task will empty the target table and write the full amount of data during the first execution, and then perform incremental synchronization.

When you pause the pipeline task and enter the editing page:

1. If you add a source table, the added table will be synchronized according to the selected synchronization type.

If you set Synchronization Type to Full + Incremental Synchronization, the added table will experience full synchronization first, with incremental synchronization suspended in the background, which will start after the full synchronization of the added table is finished.
If you set Synchronization Type to Incremental Sync Only:

If Incremental Sync Start Point is modified, all tables (including the added table) will be synchronized according to the specified start point.

If Incremental Sync Start Point is not modified, the added table will be synchronized according to the built-in breakpoint of the task.

2. If you remove a source table and save the modification, all related information of the removed table will be deleted. The corresponding table will not be synchronized if the task starts.

In FineDataLink of V4.2.6.4 and later versions, when a table is removed from a pipeline task, the dirty data generated during the synchronization of that table will be cleaned up.

When you process dirty data:

1. Retry Dirty Data: If you retry dirty data of a single table or the selected tables, the cached dirty data will be resubmitted, and the dirty data volume statistics will be updated.

2. Resync: If you click Resync, the task will empty the target table and execute full synchronization again, and perform incremental synchronization after full synchronization is completed.

If you enable Synchronization Source Table Structure Change and set Data Deletion Strategy to Logical Deletion at Target End:

For details, see Data Pipeline - Synchronizing Source Table Structure Changes and Setting the Data Deletion Strategy in Pipeline Task Configuration - Target Selection.

If you enable Retry After Failure:

Logic Description for Retry After Failure:

If the full synchronization is not completed, the synchronization will restart from the beginning. If the full synchronization is completed, the incremental synchronization will resume from the last breakpoint. In other words, breakpoints do not exist in the full synchronization phase but only exist in the incremental synchronization phase.
The retry count is reset once the pipeline task reruns successfully.

Pipeline Task O&M

Note:

1. If the project contains pipeline tasks, you are not advised to use the kill -9 pid command to close the project, as this may cause pipeline task exceptions. You are advised to use the kill pid command instead. For details, see Stopping/Restarting/Starting a FineDataLink Project.

2. A pipeline task cannot be edited by multiple users simultaneously in FineDataLink of V4.1.6.3 and later versions. For details, see Edit Lock for Tasks Against Concurrent Editing.

Operation	Help Document
Rename, move, copy, export, and delete a pipeline task.	Task List
Migrate a pipeline task.	Pipeline Task Import and Export
View the task running status and logs, and process dirty data. Editing the pipeline task: Note: Modification scope is limited. If certain setting items cannot be modified, you can copy the pipeline task to modify them in the copy. Add/Delete a table or modify the data connection. Modify the configuration of Synchronization Source Table Structure Change, Table Dirty Data Threshold, Retry After Failure, Result Notification, and other setting items.	Single Pipeline Task Management
Manage all pipeline tasks in a unified manner, such as stopping and deleting tasks, checking the task running status and synchronization performance, monitoring and processing the exceptions, and pausing tasks in batches.	Pipeline Task O&M - Task Management
Check the pipeline task concurrency. Add/Delete tables after the pipeline task starts running. Handle unrunnable tasks containing uneditable configuration to enable execution.	Data Pipeline O&M Guide
View tables in the FineDB database that record information about the pipeline tasks.	Data Pipeline
Grant View, Edit, and Authorization permission on pipeline tasks.	Pipeline Task Management Permission Pipeline Task Authorization Permission
Troubleshoot Data Pipeline issues.	Data Pipeline FAQ Data Pipeline Troubleshooting
Know the pipeline task editor, the edit time, breakpoint information, and other information that cannot be viewed through Pipeline Task O&M in the FineDataLink project.	BI Dashboard Displaying Pipeline Task Information
Facilitate the management of mapping relationships between source tables and target tables. (The pipeline task you create may involve real-time synchronization of hundreds of tables, where some source and target tables may have different names and multiple source tables may be synchronized to the same target table, making it difficult to manage the mapping relationships between source and target tables, for example, during task refactoring.) The source tables in pipeline tasks are recorded in the fine_dp_pipeline_task table in the FineDB database.	Exporting the Information on the Source Table and Corresponding Target Table from the Pipeline Tasks

Helpful
Not helpful
Only read

中文（简体）

English

Real-Time Pipeline Task Instruction

Overview

Preparation

Configuration Procedure

Description of Synchronization Logic in Different Scenarios

Pipeline Task O&M

附件列表