This document introduces how to use the Data Pipeline function.
Step one (required): Prepare the FineDataLink project.
For details about deploying the FineDataLink project, see FineDataLink Deployment Method Selection.
Step two (required): Register the corresponding function point.
To use the Data Pipeline function, you must register the corresponding function point. For details, see Registration Introduction.
Step three (required): Prepare the data source.
For details about the data sources supported by pipeline tasks, see Types of Data Sources Supported by Data Pipeline.
You need to have Use permission on data connections. For details about creating data connections, see Data Source Creation and Management. Or you can contact the admin to assign Use permission on data connections to you. For details, see Overview of Data Connection Permission.
1. For details about a complete example of configuring a pipeline task, see Pipeline Task Example.
2. For MySQL, SQL Server, or Oracle databases with multiple tables needing real-time synchronization, you are advised to use a single pipeline task to synchronize all these tables in the same database to avoid overloading the database.
Step One: Database Environment Preparation (Required)
Grant the account configured in the data connection used in the pipeline task the necessary permission to perform the required operations on the database. For details, see Overview of Database Environment Preparation.
Step Two: Pipeline Task Environment Preparation (Required)
Deploy Kafka (an open-source event streaming platform) as the middleware. For details, see Kafka Deployment - ZooKeeper Mode and Transmission Queue Configuration. (Only the super admin of the FineDataLink project can configure the transmission queue.)
1. You are advised to deploy Kafka on a Linux system. (While Kafka can also be deployed on a Windows system, its performance will be limited. This deployment method is only suitable for demonstration purposes and not recommended in production environments.) Additionally, you can deploy Kafka and FineDataLink on different servers.
2. Before restarting Kafka, you need to manually pause the pipeline tasks. After restarting Kafka, you need to manually restart the pipeline tasks. Otherwise, exceptions will occur in pipeline tasks.
3. Kafka clusters are not supported by FineDataLink currently.
4. The Data Pipeline function requires Kafka as middleware to achieve real-time synchronization, which generates one topic for each source table. For example, if there are 1000 source tables, Kafka will generate 1000 topics.
Step Three: Pipeline Task Permission Assignment (Optional)
Grant permission on Data Pipeline to users who are not super admins.
Grant users who need to create pipeline tasks in a specific folder the Management permission on that folder.
For details, see Pipeline Task Management Permission.
Step Four: Pipeline Task Configuration (Required)
For details, see the following documents in order.
1. Pipeline Task Configuration - Source Selection
1. You are not advised to use the Data Pipeline function to synchronize fields of longtext type, as this may cause issues with Kafka and affect execution efficiency.
2. Ensure the names of source table fields contain no spaces. Otherwise, an error will occur during task startup.
2. Pipeline Task Configuration - Target Selection
3. Pipeline Task Configuration - Table Field Mapping
4. Pipeline Task Configuration - Pipeline Control
When it comes to the setting of Table Dirty Data Threshold, you should note that:
In FineDataLink of versions before V4.2.1.1, the configured dirty data threshold applies to the entire task. However, in FineDataLink of V4.2.1.1 and later versions, this setting item is adjusted to Table Dirty Data Threshold, causing original task-level thresholds to be large for individual tables. Therefore, after upgrading FineDataLink to V4.2.1.1 or later versions, you are advised to appropriately reduce the existing threshold to maintain effective dirty data control.
When you create or copy a pipeline task:
1. In the Source Selection step:
If you set Synchronization Type to Full + Incremental Synchronization, the task will synchronize all inventory data first, and then continuously synchronize the changes.
When the task runs for the first time, it performs full synchronization followed by incremental synchronization. If the task is interrupted or paused and then restarted, it will resume from the breakpoint of incremental synchronization and continue incremental synchronization (provided that the full synchronization for all tables has been completed).
If you set Synchronization Type to Incremental Sync Only, see Pipeline Task Configuration - Source Selection for instructions.
2. If you select an existing table as the target table and the structure (referring to the table name and the field name) of the target table is the same as that of the source table, the task will empty the target table and write the full amount of data during the first execution, and then perform incremental synchronization.
When you pause the pipeline task and enter the editing page:
1. If you add a source table, the added table will be synchronized according to the selected synchronization type.
If you set Synchronization Type to Full + Incremental Synchronization, the added table will experience full synchronization first, with incremental synchronization suspended in the background, which will start after the full synchronization of the added table is finished.
If you set Synchronization Type to Incremental Sync Only:
If Incremental Sync Start Point is modified, all tables (including the added table) will be synchronized according to the specified start point.
If Incremental Sync Start Point is not modified, the added table will be synchronized according to the built-in breakpoint of the task.
2. If you remove a source table and save the modification, all related information of the removed table will be deleted. The corresponding table will not be synchronized if the task starts.
In FineDataLink of V4.2.6.4 and later versions, when a table is removed from a pipeline task, the dirty data generated during the synchronization of that table will be cleaned up.
When you process dirty data:
1. Retry Dirty Data: If you retry dirty data of a single table or the selected tables, the cached dirty data will be resubmitted, and the dirty data volume statistics will be updated.
2. Resync: If you click Resync, the task will empty the target table and execute full synchronization again, and perform incremental synchronization after full synchronization is completed.
If you enable Synchronization Source Table Structure Change and set Data Deletion Strategy to Logical Deletion at Target End:
For details, see Data Pipeline - Synchronizing Source Table Structure Changes and Setting the Data Deletion Strategy in Pipeline Task Configuration - Target Selection.
If you enable Retry After Failure:
Logic Description for Retry After Failure:
If the full synchronization is not completed, the synchronization will restart from the beginning. If the full synchronization is completed, the incremental synchronization will resume from the last breakpoint. In other words, breakpoints do not exist in the full synchronization phase but only exist in the incremental synchronization phase.
The retry count is reset once the pipeline task reruns successfully.
1. If the project contains pipeline tasks, you are not advised to use the kill -9 pid command to close the project, as this may cause pipeline task exceptions. You are advised to use the kill pid command instead. For details, see Stopping/Restarting/Starting a FineDataLink Project.
2. A pipeline task cannot be edited by multiple users simultaneously in FineDataLink of V4.1.6.3 and later versions. For details, see Edit Lock for Tasks Against Concurrent Editing.
Rename, move, copy, export, and delete a pipeline task.
Task List
Migrate a pipeline task.
Pipeline Task Import and Export
View the task running status and logs, and process dirty data.
Editing the pipeline task:
Add/Delete a table or modify the data connection.
Modify the configuration of Synchronization Source Table Structure Change, Table Dirty Data Threshold, Retry After Failure, Result Notification, and other setting items.
Single Pipeline Task Management
Manage all pipeline tasks in a unified manner, such as stopping and deleting tasks, checking the task running status and synchronization performance, monitoring and processing the exceptions, and pausing tasks in batches.
Pipeline Task O&M - Task Management
Check the pipeline task concurrency.
Add/Delete tables after the pipeline task starts running.
Handle unrunnable tasks containing uneditable configuration to enable execution.
Data Pipeline O&M Guide
View tables in the FineDB database that record information about the pipeline tasks.
Data Pipeline
Grant View, Edit, and Authorization permission on pipeline tasks.
Pipeline Task Management Permission
Pipeline Task Authorization Permission
Troubleshoot Data Pipeline issues.
Data Pipeline FAQ
Data Pipeline Troubleshooting
Know the pipeline task editor, the edit time, breakpoint information, and other information that cannot be viewed through Pipeline Task O&M in the FineDataLink project.
BI Dashboard Displaying Pipeline Task Information
Facilitate the management of mapping relationships between source tables and target tables. (The pipeline task you create may involve real-time synchronization of hundreds of tables, where some source and target tables may have different names and multiple source tables may be synchronized to the same target table, making it difficult to manage the mapping relationships between source and target tables, for example, during task refactoring.)
The source tables in pipeline tasks are recorded in the fine_dp_pipeline_task table in the FineDB database.
Exporting the Information on the Source Table and Corresponding Target Table from the Pipeline Tasks
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy