Overview
Application Scenario
It is difficult for enterprises to synchronize incremental data in batches regularly with high performance by using Data Synchronization when constructing data warehouses and intermediate databases due to the large data volume. If the data write method of clearing the target table before writing data is used, issues such as temporary unavailability of the target table and prolonged extraction time may occur.
Therefore, it is expected to complete real-time data synchronization with high performance in the case of large data volume or standard table structure.
Function Description
Data Pipeline supports real-time full and incremental synchronization of single-table, multi-table, and whole-database data from data sources and the data from multiple source tables of the same structure to one table. You can configure real-time synchronization tasks based on data connections, as shown in the following figure.

Implementation Principle
FDL monitors changes in the database logs at the source end of the data pipeline, using Kafka as a middleware for data synchronization to temporarily store the incremental parts from the source database, thereby achieving real-time data writing to the target end.

Description of Resuming from Breakpoint
A failed pipeline task can continue from the breakpoint. In this case, if the full data has not been synchronized, the data synchronization will start from the beginning. If the full data has been synchronized, the data synchronization will start from the breakpoint.
Example of Resuming from Breakpoint:
The pipeline task read data on March 21, stopped reading data on March 23, and restarted it on March 27. Data from March 23 to March 27 was synchronized.
Restriction on Use
Pipeline tasks are only supported in an independent deployment environment.
Synchronizing views and indexes is not supported by pipeline tasks.
Function Description

Function
| Description |
---|
Data Source | Data Pipeline supports multiple data sources, which can form various source-target combinations for real-time data synchronization. For data sources supported by Data Pipeline, see the section "Types of Data Sources Supported by Data Pipeline" of Data Sources Supported by FineDataLink. |
Synchronization Scenario | Description of synchronization objects: Synchronization Object | Description |
---|
Single/Multiple Tables | Real-time synchronization of single tables and multiple tables is supported. | Entire Database | Multiple tables from multiple databases can be configured in a single instance at once. In a single task, you can select up to 5,000 tables. No additional selections are allowed once the limit is reached. | Multiple Tables to One Table | In FineDataLink 4.1.8.1 and later versions, the data from multiple source tables of the same structure can be synchronized to one table in real time. The intersection of all source table fields is taken as the fields for the group table. |
Description of synchronization types: For details, see Pipeline Task Configuration. Full + Incremental Synchronization: First synchronize all inventory data, and then continuously synchronize the changes (additions/deletions/changes). Incremental Synchronization: Only synchronize the incremental data. The incremental data will be synchronized starting from the specified start point when the task runs for the first time.
|
Task Configuration | 1. Before task configuration, you need to prepare the database environment and pipeline task environment. 2. In FDL, configuring pipeline tasks is simple and does not require coding. FDL has various functions. The details are as follows: Step
| Highlight |
---|
Data Source Selection | | Data Destination Selection | You can configure the target table to perform physical deletion (actual data deletion) or logical deletion (marking data as deleted without actually deleting it). You can use the Mark Timestamp When Syncing function to record when the data is added or updated in the database (local time of the database). You can choose whether to enable the Synchronize the Source Table Structure Change function. You can set Synchronization with No Primary Key.
| Table Field Mapping Setting | You can choose the target table to be either an existing table or an automatically created table. You can modify table names and table creation methods in batches. You can filter tables and fields with the following conditions: whether the target table configuration has exceptions, whether the target table has a primary key, table creation method, whether the target table fields are mapped, and whether there are exceptions in the target table fields.
In FineDataLink 4.1.9.3 and later versions, you can uniformly customize field type mapping rules for multiple pipeline tasks under the same data connection. In FineDataLink 4.0.18 and later versions, when target tables are automatically created tables, case conversion and automatic case correction of table names and field names are supported. For details, see General Configuration.
| Pipeline Control | You can set Dirty Data Threshold, and the pipeline task will automatically terminate when the threshold is reached. You can set Retry After Failure. When a pipeline task is interrupted due to network fluctuations or other reasons, it will automatically restart if Retry After Failure is set. When a task encounters an issue, you can receive notifications through the following channels: SMS, mail, platform messages, DingTalk group bot, Lark group bot, and WeCom group bot. You can set the log level for pipeline tasks to meet your needs for viewing logs and debugging tasks. You can also print detailed logs of finer granularity to review.
|
|
Task O&M | You can modify, copy, rename, move, import/export, and delete pipeline tasks. You can monitor the pipeline task status, view read/write statistics, check logs, and handle dirty data. You can start and pause pipeline tasks in batches. The dirty data list is provided. After data synchronization is completed, you can perform batch correction for the dirty data and synchronize the corrected data separately.
|
Other | You can restore and manage deleted tasks in Recycle Bin in FineDataLink 4.1.9.3 and later versions. |