Pipeline Task Configuration - Source Selection- FineDataLink Help Document

Last update: February 18, 2025

Overview

Version

FineDataLink Version	Functional Change
4.1.4	Supported data reading from the SAP HANA and Db2 databases.
4.1.7.2	Blocked the _fdl_update_timestamp and _fdl_marked_deleted fields in source tables during real-time data synchronization.
4.1.8.1	Supported N:1 synchronization in pipeline tasks, allowing you to select multiple source tables with the same structure to generate a group table in the Source Selection step.

Function Description

You can configure items such as the database table to be synchronized and synchronization type in Source Selection, as shown in the following figure.

Prerequisite

1. Ensure that you have completed the following preparations.

Procedure
Step One: Data Source Configuration Select the source and target databases as needed. For details about databases supported by Data Pipeline, see Types of Data Sources Supported by Data Pipeline. Establish data connections to source and target databases in Data Connection Management, so that you can configure the source and target databases of pipeline tasks by selecting the data source names. For details, see Data Connection Configuration.
Step Two: Database Environment Preparation Grant the account configured in the data connection used in the pipeline task the necessary permission to perform the required operations on the database. For details, see Overview of Database Environment Preparation.
Step Three: Pipeline Task Environment Preparation Deploy Kafka (an open-source event streaming platform) as the middleware. For details, see Kafka Deployment and Transmission Queue Configuration.
Step Four: Pipeline Task Permission Assignment Grant the permission to use Data Pipeline to users who are not super admins. For details, see Pipeline Task Management Permission.

Procedure

Step One: Data Source Configuration

Select the source and target databases as needed. For details about databases supported by Data Pipeline, see Types of Data Sources Supported by Data Pipeline.

Establish data connections to source and target databases in Data Connection Management, so that you can configure the source and target databases of pipeline tasks by selecting the data source names. For details, see Data Connection Configuration.

Step Two: Database Environment Preparation

Grant the account configured in the data connection used in the pipeline task the necessary permission to perform the required operations on the database. For details, see Overview of Database Environment Preparation.

Step Three: Pipeline Task Environment Preparation

Deploy Kafka (an open-source event streaming platform) as the middleware. For details, see Kafka Deployment and Transmission Queue Configuration.

Step Four: Pipeline Task Permission Assignment

Grant the permission to use Data Pipeline to users who are not super admins. For details, see Pipeline Task Management Permission.

2. Click Data Pipeline and create a pipeline task, as shown in the following figure.

Procedure

2025-01-24_09-44-30 copy.png

Source Selection

1. For details about source databases supported by pipeline tasks, see Types of Data Sources Supported by Data Pipeline.

Note:

In V4.0.29 and later versions, fields with the following data types from the Oracle database are automatically blocked during synchronization: BLOB, CLOB, NCLOB, LONG, RAW, LONG RAW, and BFILE.
In V4.1.7.2 and later versions, the _fdl_update_timestamp and _fdl_marked_deleted fields in source tables will be blocked during real-time data synchronization.

2. Click Data Source Permission Detection to check whether the account configured in the data connection has permission to read the database log, as shown in the following figure.

Read Mode

It varies according to the source databases.

Synchronization Type

Note:

In scenarios with a large volume of existing data, typically the data should be loaded using specific high-speed methods or imported in multiple batches. In this case, you can select Incremental Synchronization Only in Pipeline Task to synchronize incremental data continuously after all existing data has been synchronized.

Full + Incremental Synchronization

All inventory data is synchronized first, and the changes are continuously synchronized. When the task runs for the first time, it performs full synchronization followed by incremental synchronization. If the task is interrupted or paused and then restarted, it will resume from the breakpoint of incremental synchronization and continue incremental synchronization (provided that the full synchronization for all tables has been completed).

Incremental Sync Only

Incremental Sync Start Point	Description
If you select Task Startup Time as Incremental Sync Start Point, the task startup time will be used as the parsing start time. When you import historical data using the recommended approach to the target data source without any filtering conditions, the entire historical data will be imported, and you can set the incremental synchronization start point to the task execution start time.	1. The task only synchronizes incremental data. When it runs for the first time, it synchronizes the incremental data since the set start time. 2. Supported data sources include MySQL, Oracle, SQL Server, and PostgreSQL 3. After configuration, the incremental synchronization start time is in the yyyy-MM-dd HH:mm:ss.000 format with millisecond-level precision, based on the database time zone. Note: For the PostgreSQL and SAP HANA data sources, only Task Startup Time can be selected as Incremental Sync Start Point.
If you select Custom Time as Incremental Sync Start Point, you must specify the incremental start time, which is required and defaults to blank. You can specify the time with second-level precision. The earliest available time is the first recorded timestamp in the database logs. When you import historical data using the recommended approach to the target data source with the time-based filtering condition, you can set the incremental synchronization start point to the earliest selectable time in the filter.

Incremental Sync Start Point

Description

If you select Task Startup Time as Incremental Sync Start Point, the task startup time will be used as the parsing start time.

When you import historical data using the recommended approach to the target data source without any filtering conditions, the entire historical data will be imported, and you can set the incremental synchronization start point to the task execution start time.

1. The task only synchronizes incremental data. When it runs for the first time, it synchronizes the incremental data since the set start time.

2. Supported data sources include MySQL, Oracle, SQL Server, and PostgreSQL

3. After configuration, the incremental synchronization start time is in the yyyy-MM-dd HH:mm:ss.000 format with millisecond-level precision, based on the database time zone.

Note:

For the PostgreSQL and SAP HANA data sources, only Task Startup Time can be selected as Incremental Sync Start Point.

If you select Custom Time as Incremental Sync Start Point, you must specify the incremental start time, which is required and defaults to blank. You can specify the time with second-level precision.

The earliest available time is the first recorded timestamp in the database logs.

When you import historical data using the recommended approach to the target data source with the time-based filtering condition, you can set the incremental synchronization start point to the earliest selectable time in the filter.

Synchronization Object

You can select the tables or databases to be synchronized in real time.

Quick Selection

Quick Selection enables batch selection of multiple tables, streamlining the process of selecting source tables, as shown in the following figure.

Ordinary Table

Scenario: You want to synchronize data from one table to another in real time.

You need to find the table to be synchronized from Existing Table.

Group Table

Scenario: In device data collection, you want to synchronize data in real time from multiple source tables (from different databases) with the same structure to a single table in the data warehouse.

1. You can synchronize data from multiple source tables with the same structure to a single target table (namely a group table), using the intersection of all source table fields as the fields for the group table, as shown in the following figure.

Note:

Multiple group tables can be generated.
When identifying the field intersection, the system only requires matching field names. If the data type of a field in the group table differs across subtables, the system will select a type compatible with all data types of this field as the data type.

Scenario one: If the fields in the subtables have a complete intersection, the group table will be named after the first subtable you selected.

Scenario two: If no intersection fields are found, the error message "The selected subtables have no field intersection and a group table cannot be generated. You are advised to cancel the operation and check the subtable structure." will be displayed, as shown in the following figure.

Group tables cannot be generated in this scenario.

Scenario three: If some fields are beyond the intersection, the warning message "The existing subtable fields beyond the intersection will not be synchronized. You are advised to cancel the operation and check the subtable structure." will be displayed. The name of the generated group table is the name of the first subtable you selected, as shown in the following figure.

In this scenario, the generated group table only contains the intersection fields, and normal real-time synchronization can be realized.

2. The following figure shows the Synchronization Object area after a group table is generated.

Note:

If only group tables are displayed, ordinary tables still exist but are grayed out with a message saying "No ordinary table."

You can rename the group tables, but duplicate names are not allowed. Once the task is started, you cannot rename the group tables. You can delete the group tables. Once a group table is deleted, the subtables will return to ordinary tables.

Description one: When group tables and ordinary tables coexist, ordinary tables can be moved to group tables, as shown in the following figure.

Description two: For existing group tables, you can select subtables to return them to Ordinary Table or Existing Table, as shown in the following figure.

Both the above scenarios will trigger intersection field validation. For details, see the following table.

Scenario	Description
The intersection fields remain unchanged, and no fields exist beyond the intersection in the subtables.	The group table update will be successful.
The intersection fields remain unchanged, but fields beyond the intersection exist in the subtables.	The warning message "The existing subtable fields beyond the intersection will not be synchronized. You are advised to cancel the operation and check the subtable structure." will be displayed. If you click Continue Generation, the generated group table only contains the intersection fields, and normal real-time synchronization can be realized.
The intersection fields have been changed (added, removed, or changed in type), the pipeline task has not yet been run, and the warning message "This operation will change the original group table field. You are advised to cancel the operation and check the subtable structure." is displayed.	If fields beyond the intersection exist in abnormal subtables, clicking Continue Generation will generate a group table with a yellow exclamation mark. The following figure shows the warning message. If no such fields exist, the group table will be generated without any warnings.
The intersection fields have been changed (added, removed, or changed in type) and reduced to zero, and the pipeline task has not yet been run.	The group table update will fail.
The intersection fields have been changed (added, removed, or changed in type), and the pipeline task has been run.	The group table update will fail, and the message "This operation will change the original group table field. The task has been run and the operation is not allowed. You are advised to check the subtable structure." will be displayed.

Subsequent Operations

For details, see Pipeline Task Configuration - Target Selection.

Previous：Pipeline Task Configuration

Next：Pipeline Task Configuration - Target Selection

Helpful
Not helpful
Only read

中文（简体）

English

Pipeline Task Configuration - Source Selection

Overview

Version

Function Description

Prerequisite

Procedure

Source Selection

Read Mode

Synchronization Type

Synchronization Object

Subsequent Operations

附件列表