Pipeline Task Configuration - Target Selection- FineDataLink Help Document

Last update: September 06, 2024

Overview

Version

FineDataLink Version	Functional Change
4.0.5	/
4.1.7.3	Supported the writing of data into the SAP HANA database.
4.1.8.1	Provided no support for synchronizing DDL changes in group tables using Pipeline Task.
4.1.8.2	Pipeline Task supported the writing of data into the ShenTong database.
4.1.11.2	Supported data synchronization to the YMatrix database without a primary key.

FineDataLink Version	Functional Change
4.0.6	Supported the writing of data into the MySQL database.
4.0.7	Supported the writing of data into the SQL Server database.
4.0.8	Supported the writing of data into the GaussDB 200 database.
4.0.9	Supported the writing of data into Oracle and PostgreSQL databases.
4.0.14	Supported logical deletion and the display of inbound timestamps.
4.0.15	Supported the writing of data into Greenplum and Greenplum (Parallel Loading) databases.
4.0.17	Supported connection to the MySQL server for data read/write and synchronization of source table structure changes.
4.0.24	Supported the writing of data into the StarRocks database.
4.0.28	Pipeline Task supported the writing of data into the TiDB database. Pipeline Task supported the writing of data into the ClickHouse database. Pipeline Task supported the writing of data into the Amazon Redshift database.
4.0.29	Optimized the data loading logic when Greenplum and Greenplum (Parallel Loading) databases were used as the target end of synchronization: Data would be first loaded through COPY Loading, and the data that failed to be loaded would be loaded again through JDBC loading. The data that still failed to be loaded would be recorded as dirty data.
4.1.1	Supported data synchronization into Oracle, Greenplum, and SQL Server databases without a primary key. For details, see the section "Enabling Synchronization Without Primary Key" of this document.
4.1.3	Supported batch deletion of data to be written into the StarRocks database to improve synchronization efficiency.
4.1.7.2	Supported the Logical Deletion at Target End function if the target table was an existing table that contained the _fdl_marked_deleted field.
4.1.7.2	Supported the Mark Timestamp During Synchronization function if the target table was an existing table that contained the _fdl_marked_deleted field.

Function Description

In Target Selection, you need to specify where the data synchronized in real time is stored and decide whether to add a timestamp field (indicating the time of data changes) to the target table and whether to synchronize changes in the source table structure to the target table, as shown in the following figure.

Notes

The COPY Loading method is used when the data is written into a Greenplum or Greenplum (Parallel Loading) database using a pipeline task, which requires a fdl_temp schema in the target database to store temporary tables. Permit users to create schemas in the database. For details, see Greenplum Data Connection.

The data loading logic when Greenplum and Greenplum (Parallel Loading) databases are used as the target end of synchronization is optimized in FineDataLink 4.0.29 and later releases. The data is first loaded through COPY Loading, and the data that fails to be loaded is loaded again through JDBC loading. The data that still fails to be loaded is recorded as dirty data.

Procedure

Configuring the Data Source and the Database

Specify the location where the data synchronized in real time is stored. For target ends supported by Data Pipeline, see Data Sources Supported by FineDataLink.

If the structure (referring to the table name and the field name) of the target table is the same as that of the source table, the target table will be emptied and written into the full amount of data during the first data synchronization. The incremental data will be synchronized later.
If there is no table in the target database identical to the source table, a table will be created in the target database.

Setting the Data Deletion Strategy

1. The two methods are described as follows.

Physical Deletion at Target End: If the data is deleted from the source table, the corresponding data will be deleted from the target table.
Logical Deletion at Target End: Add a boolean field named _fdl_marked_deleted (whose value defaults to false) to the target table. If a data record is deleted from the source table, the corresponding record in the target table will not be physically deleted. Instead, the value of _fdl_marked_deleted of this data record will be changed to true.

Note:

For FineDataLink 4.1.7.2 and later releases, you can use Logical Deletion at Target End if the target table is an existing table that contains the _fdl_marked_deleted field.

For FineDataLink 4.1.7.2 and later releases, if the target table is an existing table that contains the _fdl_marked_deleted field and you do not enable Logical Deletion at Target End, this field will be filled with null values.

The logical deletion function is described as follows.

Target Table
Logical Deletion Description
Auto Created Table
A _fdl_marked_deleted field is added to the target table to mark the deletion status during synchronization.

Existing Table Without a _fdl_marked_deleted Field
Set Synchronization Type to Full + Incremental Synchronization: A _fdl_marked_deleted field is added to the target table to mark the deletion status during synchronization.
Set Synchronization Type to Incremental Sync Only: A _fdl_marked_deleted field is added to the target table during synchronization whose value is false for the historical data and true for the deleted data.

Existing Table with a _fdl_marked_deleted Field
Set Synchronization Type to Full + Incremental Synchronization: The _fdl_marked_deleted field is used to mark the deletion status during synchronization.
Set Synchronization Type to Incremental Sync Only: The value of _fdl_marked_deleted is changed to true for the deleted data and remains unchanged for the historical data during synchronization.

Target Table	Logical Deletion Description
Auto Created Table	A _fdl_marked_deleted field is added to the target table to mark the deletion status during synchronization.
Existing Table Without a _fdl_marked_deleted Field	Set Synchronization Type to Full + Incremental Synchronization: A _fdl_marked_deleted field is added to the target table to mark the deletion status during synchronization. Set Synchronization Type to Incremental Sync Only: A _fdl_marked_deleted field is added to the target table during synchronization whose value is false for the historical data and true for the deleted data.
Existing Table with a _fdl_marked_deleted Field	Set Synchronization Type to Full + Incremental Synchronization: The _fdl_marked_deleted field is used to mark the deletion status during synchronization. Set Synchronization Type to Incremental Sync Only: The value of _fdl_marked_deleted is changed to true for the deleted data and remains unchanged for the historical data during synchronization.

2. Notes:

If you enable logical deletion in FineDataLink 4.0.23, the target table will experience content clearing and full synchronization, greatly improving data synchronization efficiency.
If the data to be synchronized contains the data with the same primary key as the data in the target table, the new data will not be inserted. Instead, the existing data will be updated, and the data previously marked as deleted will be marked as undeleted.
GaussDB databases and PostgreSQL databases (of 9.4 and earlier versions) do not support Logical Deletion at Target End.

Marking Timestamp During Synchronization

(Optional) Select Mark Timestamp During Synchronization. A long integer field named _fdl_update_timestamp is added to all target tables to record when the data is added to and updated in the database (the database time) in the form of a millisecond-level timestamp.

Noted that:

The settings of Logical Deletion at Target End and Mark Timestamp During Synchronization take effect for all data tables in the pipeline task by default and cannot be configured for a specified table.
Timestamp synchronization is performed after all data is synchronized. If the data volume is large, the timestamp may temporarily be empty during synchronization.

The synchronization logic after you enable Mark Timestamp During Synchronization is as follows.

Target Table	Description
Auto Created Table	A _fdl_update_timestamp field is added to the target table and filled with timestamps during synchronization.
Existing Table Without a _fdl_update_timestamp Field	Set Synchronization Type to Full + Incremental Synchronization: A _fdl_update_timestamp field is added to the target table and filled with timestamps during synchronization. Set Synchronization Type to Incremental Sync Only: A _fdl_update_timestamp field is added to the target table and filled with timestamps, where the timestamp of new data is updated during synchronization.
Existing Table with a _fdl_update_timestamp Field (supported in FineDataLink 4.1.7.2 and later releases)	Set Synchronization Type to Full + Incremental Synchronization: Timestamps in the _fdl_update_timestamp column are updated during synchronization. Set Synchronization Type to Incremental Sync Only: Timestamps in the _fdl_update_timestamp column of the new data are updated during synchronization, and those of the historical data are skipped.

Target Table

Description

Auto Created Table

A _fdl_update_timestamp field is added to the target table and filled with timestamps during synchronization.

Existing Table Without a _fdl_update_timestamp Field

Set Synchronization Type to Full + Incremental Synchronization: A _fdl_update_timestamp field is added to the target table and filled with timestamps during synchronization.
Set Synchronization Type to Incremental Sync Only: A _fdl_update_timestamp field is added to the target table and filled with timestamps, where the timestamp of new data is updated during synchronization.

Existing Table with a _fdl_update_timestamp Field (supported in FineDataLink 4.1.7.2 and later releases)

Set Synchronization Type to Full + Incremental Synchronization: Timestamps in the _fdl_update_timestamp column are updated during synchronization.
Set Synchronization Type to Incremental Sync Only: Timestamps in the _fdl_update_timestamp column of the new data are updated during synchronization, and those of the historical data are skipped.

If Mark Timestamp During Synchronization is not enabled and the target table contains the _fdl_update_timestamp field, the field will be filled with null values.

Here is an example:

Assume that you select Mark Timestamp During Synchronization and Logical Deletion at Target End in a pipeline task and delete the data with Order ID of 10257 from the Order table in the source database, as shown in the following figure.

After synchronization, the data with Order ID of 10257 in the target table is not deleted. The _fdl_marked_deleted field value of this data record is modified to true, and the _fdl_update_timestamp field value is updated with a timestamp recording the deletion, as shown in the following figure.

Note:

You can only configure Logical Deletion at Target End and Mark Timestamp During Synchronization during pipeline task creation and for temporarily saved tasks. Modifying the configuration of running or paused tasks is not supported.

Synchronizing Source Table Structure Change

During real-time data synchronization using pipeline tasks, the source table structure may change due to business adjustments, such as adding or deleting tables and fields, renaming fields, and changing field types. You want these changes to be automatically synchronized to target tables.

For details, see Data Pipeline - Synchronizing Source Table Structure Change.

Note:

If you select a group table in Source Selection, the Synchronize Source Table Structure Change function is unavailable.

Assume that Synchronize Source Table Structure Change is disabled and the source table structure changes. The execution logic of the pipeline task is as follows.

Source-End Operation	Target-End Execution Logic	Field Mapping Configuration	Target Table Structure	Target Table Data
Delete a table.	Continue synchronizing other tables.	Not changed.	Not changed. All changes are ignored.	No new data is written into the deleted table in subsequent synchronization.
Rename a table.	Delete the table with the original name and continue synchronizing other tables.			No new data is written into the table with the original name in subsequent synchronization.
Delete a field.	Continue synchronizing other fields.			The deleted field is filled with null values in subsequent synchronization.
Add a field.	Ignore the new field and continue synchronizing other fields.			No changes are made. The data is synchronized following the original field configuration.
Rename a field.	Delete the field with the original name and continue synchronizing other fields.			The renamed field is filled with null values in subsequent synchronization.
Change the field type.	Do not synchronize the type changes and record the data with unmatched type as dirty data.			No changes are made. The data is synchronized following the original field configuration.

Setting Synchronization Without a Primary Key

Business tables from some data sources lack primary keys. You may be unsure which fields to mark as logical primary keys when using these tables as source tables. In this case, you can enable Synchronization Without a Primary Key for real-time data synchronization without marking the primary key.

Note that if Synchronization Without a Primary Key is enabled, all available fields in the target table will be automatically configured as logical primary keys in the pipeline task during data synchronization. These logical primary keys serve as identifier fields for update and deletion.

All comparable fields in the target table are used for comparison, but the operation on a data record in the source table is synchronized to all identical data records. For example, if one data record in the source is modified, all data records in the target table identical to the original will be modified.

For details on target-end primary key setting and synchronization logic, see Pipeline Task Configuration - Table Field Mapping.

In FineDataLink 4.1.1 and later releases, the Synchronization Without Primary Key function is available when you select an Oracle/GP/SQL Server database as the data source in Target Selection.

In FineDataLink 4.1.11.2 and later releases, the Synchronization Without Primary Key function is available when you select a YMatrix database as the data source in Target Selection.

A prompt pops up after you enable Synchronization Without Primary Key, as shown in the following figure.

Scenario Description:

Assume that there are only five identical data records at the source end.

There will be five identical data records at the target end after the first synchronization.
If you delete one or more of these source-end identical data records, all target-end identical data records will be deleted.
If you add one or multiple data records identical to the existing ones to the source end, the target-end identical data will remain unchanged.
If you modify one of these source-end identical data records, all target-end identical data records will be modified.
If you modify multiple source-end data records, the first change will be synchronized to all target-end identical data records, and other source-end data records after modification will be synchronized to the target end as new data.

change logic (1).jpeg

If you have enabled Synchronize Source Table Structure Change:

Adding a field to the source table: The change is synchronized, but the field added to the target table is not marked as the physical primary key or the logical primary key.
Deleting a field from the source table: If the target-end field corresponding to the deleted field is set as the logical primary key, the task will be aborted with an error reported.

If the target-end field corresponding to the deleted field is set as the physical primary key, the task will be aborted with an error reported as the primary key cannot be empty.

Subsequent Operations

For details, see Pipeline Task Configuration -Table Field Mapping.

Previous：Pipeline Task Configuration - Source Selection

Next：Pipeline Task Configuration - Table Field Mapping

Helpful
Not helpful
Only read

中文（简体）

English

Pipeline Task Configuration - Target Selection

Overview

Version

Function Description

Notes

Procedure

Configuring the Data Source and the Database

Setting the Data Deletion Strategy

Marking Timestamp During Synchronization

Synchronizing Source Table Structure Change

Setting Synchronization Without a Primary Key

Subsequent Operations

附件列表