It is difficult for enterprises to synchronize incremental data in batches regularly with high performance by using Data Synchronization when constructing data warehouses and intermediate databases due to the large data volume. If the data write method of clearing the target table before writing data is used, issues such as temporary unavailability of the target table and prolonged extraction time may occur.
Therefore, it is expected to complete real-time data synchronization with high performance in the case of large data volume or standard table structure.
Data Pipeline supports real-time full and incremental synchronization of single-table, multi-table, and whole-database data from data sources and the data from multiple source tables of the same structure to one table. You can configure real-time synchronization tasks based on data connections, as shown in the following figure.
FDL monitors changes in the database logs at the source end of the data pipeline, using Kafka as a middleware for data synchronization to temporarily store the incremental parts from the source database, thereby achieving real-time data writing to the target end.
A failed pipeline task can continue from the breakpoint. In this case, if the full data has not been synchronized, the data synchronization will start from the beginning. If the full data has been synchronized, the data synchronization will start from the breakpoint.
Example of Resuming from Breakpoint:
The pipeline task read data on March 21, stopped reading data on March 23, and restarted it on March 27. Data from March 23 to March 27 was synchronized.
Pipeline tasks are only supported in an independent deployment environment.
Synchronizing views and indexes is not supported by pipeline tasks.
Data Pipeline supports multiple data sources, which can form various source-target combinations for real-time data synchronization. For data sources supported by Data Pipeline, see the section "Types of Data Sources Supported by Data Pipeline" of Data Sources Supported by FineDataLink.
Synchronization Scenario
Description of synchronization objects:
Real-time synchronization of single tables and multiple tables is supported.
Entire Database
Multiple tables from multiple databases can be configured in a single instance at once.
Multiple Tables to One Table
In FineDataLink 4.1.8.1 and later versions, the data from multiple source tables of the same structure can be synchronized to one table in real time. The intersection of all source table fields is taken as the fields for the group table.
Description of synchronization types:
For details, see Pipeline Task Configuration.
Full + Incremental Synchronization: First synchronize all inventory data, and then continuously synchronize the changes (additions/deletions/changes).
Incremental Synchronization: Only synchronize the incremental data. The incremental data will be synchronized starting from the specified start point when the task runs for the first time.
Task Configuration
1. Before task configuration, you need to prepare the database environment and pipeline task environment.
2. In FDL, configuring pipeline tasks is simple and does not require coding. FDL has various functions. The details are as follows:
You can search for the table name and then select the source table by clicking the table name.
You can use the Quick Selection function to select tables in batches.
Data Destination Selection
You can configure the target table to perform physical deletion (actual data deletion) or logical deletion (marking data as deleted without actually deleting it).
You can use the Mark Timestamp When Syncing function to record when the data is added or updated in the database (local time of the database).
You can choose whether to enable the Synchronize the Source Table Structure Change function.
You can set Synchronization with No Primary Key.
Table Field Mapping Setting
You can choose the target table to be either an existing table or an automatically created table.
You can modify table names and table creation methods in batches.
You can filter tables and fields with the following conditions: whether the target table configuration has exceptions, whether the target table has a primary key, table creation method, whether the target table fields are mapped, and whether there are exceptions in the target table fields.
In FineDataLink 4.1.9.3 and later versions, you can uniformly customize field type mapping rules for multiple pipeline tasks under the same data connection.
In FineDataLink 4.0.18 and later versions, when target tables are automatically created tables, case conversion and automatic case correction of table names and field names are supported. For details, see General Configuration.
Pipeline Control
You can set Dirty Data Threshold, and the pipeline task will automatically terminate when the threshold is reached.
You can set Retry After Failure. When a pipeline task is interrupted due to network fluctuations or other reasons, it will automatically restart if Retry After Failure is set.
When a task encounters an issue, you can receive notifications through the following channels: SMS, mail, platform messages, DingTalk group bot, Lark group bot, and WeCom group bot.
You can set the log level for pipeline tasks to meet your needs for viewing logs and debugging tasks. You can also print detailed logs of finer granularity to review.
Task O&M
You can modify, copy, rename, move, import/export, and delete pipeline tasks.
You can monitor the pipeline task status, view read/write statistics, check logs, and handle dirty data.
You can start and pause pipeline tasks in batches.
The dirty data list is provided. After data synchronization is completed, you can perform batch correction for the dirty data and synchronize the corrected data separately.
Other
You can restore and manage deleted tasks in Recycle Bin in FineDataLink 4.1.9.3 and later versions.
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy