Overview of Data Pipeline- FineDataLink Help Document

Last update: August 20, 2024

Overview

Application Scenario

It is difficult for enterprises to synchronize incremental data in batches regularly with high performance by using Data Synchronization when constructing data warehouses and intermediate databases due to the large data volume. If the data write method of clearing the target table before writing data is used, issues such as temporary unavailability of the target table and prolonged extraction time may occur.

Therefore, it is expected to complete real-time data synchronization with high performance in the case of large data volume or standard table structure.

Function Description

Data Pipeline supports real-time full and incremental synchronization of single-table, multi-table, and whole-database data from data sources and the data from multiple source tables of the same structure to one table. You can configure real-time synchronization tasks based on data connections, as shown in the following figure.

Implementation Principle

FDL monitors changes in the database logs at the source end of the data pipeline, using Kafka as a middleware for data synchronization to temporarily store the incremental parts from the source database, thereby achieving real-time data writing to the target end.

Description of Resuming from Breakpoint

A failed pipeline task can continue from the breakpoint. In this case, if the full data has not been synchronized, the data synchronization will start from the beginning. If the full data has been synchronized, the data synchronization will start from the breakpoint.

Example of Resuming from Breakpoint:

The pipeline task read data on March 21, stopped reading data on March 23, and restarted it on March 27. Data from March 23 to March 27 was synchronized.

Restriction on Use

Note:

Pipeline tasks are only supported in an independent deployment environment.

Synchronizing views and indexes is not supported by pipeline tasks.

Function Description

Function

Description

Data Source

Data Pipeline supports multiple data sources, which can form various source-target combinations for real-time data synchronization. For data sources supported by Data Pipeline, see the section "Types of Data Sources Supported by Data Pipeline" of Data Sources Supported by FineDataLink.

Synchronization Scenario

Description of synchronization objects:

Synchronization Object	Description
Single/Multiple Tables	Real-time synchronization of single tables and multiple tables is supported.
Entire Database	Multiple tables from multiple databases can be configured in a single instance at once. Note: In a single task, you can select up to 5,000 tables. No additional selections are allowed once the limit is reached.
Multiple Tables to One Table	In FineDataLink 4.1.8.1 and later versions, the data from multiple source tables of the same structure can be synchronized to one table in real time. The intersection of all source table fields is taken as the fields for the group table.

Description of synchronization types:

For details, see Pipeline Task Configuration.

Full + Incremental Synchronization: First synchronize all inventory data, and then continuously synchronize the changes (additions/deletions/changes).
Incremental Synchronization: Only synchronize the incremental data. The incremental data will be synchronized starting from the specified start point when the task runs for the first time.

Task Configuration

1. Before task configuration, you need to prepare the database environment and pipeline task environment.

2. In FDL, configuring pipeline tasks is simple and does not require coding. FDL has various functions. The details are as follows:

Step	Highlight
Data Source Selection	You can search for the table name and then select the source table by clicking the table name. You can use the Quick Selection function to select tables in batches.
Data Destination Selection	You can configure the target table to perform physical deletion (actual data deletion) or logical deletion (marking data as deleted without actually deleting it). You can use the Mark Timestamp When Syncing function to record when the data is added or updated in the database (local time of the database). You can choose whether to enable the Synchronize the Source Table Structure Change function. You can set Synchronization with No Primary Key.
Table Field Mapping Setting	You can choose the target table to be either an existing table or an automatically created table. You can modify table names and table creation methods in batches. You can filter tables and fields with the following conditions: whether the target table configuration has exceptions, whether the target table has a primary key, table creation method, whether the target table fields are mapped, and whether there are exceptions in the target table fields. Note: In FineDataLink 4.1.9.3 and later versions, you can uniformly customize field type mapping rules for multiple pipeline tasks under the same data connection. In FineDataLink 4.0.18 and later versions, when target tables are automatically created tables, case conversion and automatic case correction of table names and field names are supported. For details, see General Configuration.
Pipeline Control	You can set Dirty Data Threshold, and the pipeline task will automatically terminate when the threshold is reached. You can set Retry After Failure. When a pipeline task is interrupted due to network fluctuations or other reasons, it will automatically restart if Retry After Failure is set. When a task encounters an issue, you can receive notifications through the following channels: SMS, mail, platform messages, DingTalk group bot, Lark group bot, and WeCom group bot. You can set the log level for pipeline tasks to meet your needs for viewing logs and debugging tasks. You can also print detailed logs of finer granularity to review.

Task O&M

You can modify, copy, rename, move, import/export, and delete pipeline tasks.
You can monitor the pipeline task status, view read/write statistics, check logs, and handle dirty data.
You can start and pause pipeline tasks in batches.
The dirty data list is provided. After data synchronization is completed, you can perform batch correction for the dirty data and synchronize the corrected data separately.

Other

You can restore and manage deleted tasks in Recycle Bin in FineDataLink 4.1.9.3 and later versions.

Previous：Transmission Queue Configuration

Next：Real-Time Pipeline Task Import and Export

Helpful
Not helpful
Only read

中文（简体）

English

Overview of Data Pipeline

Overview

Application Scenario

Function Description

Implementation Principle

Description of Resuming from Breakpoint

Restriction on Use

Function Description

附件列表