Pipeline Task Configuration - Source Selection

  • Last update: February 18, 2025
  • Overview

    Version

    FineDataLink Version
    Functional Change
    4.1.4Supported data reading from the SAP HANA and Db2 databases.

    4.1.7.2

    Blocked the _fdl_update_timestamp and _fdl_marked_deleted fields in source tables during real-time data synchronization.

    4.1.8.1

    Supported N:1 synchronization in pipeline tasks, allowing you to select multiple source tables with the same structure to generate a group table in the Source Selection step.

    Function Description

    You can configure items such as the database table to be synchronized and synchronization type in Source Selection, as shown in the following figure.

    2025-01-24_09-44-30.png

    Prerequisite

    1. Ensure that you have completed the following preparations.

    Procedure
    Step One: Data Source Configuration

    Select the source and target databases as needed. For details about databases supported by Data Pipeline, see Types of Data Sources Supported by Data Pipeline.

    Establish data connections to source and target databases in Data Connection Management, so that you can configure the source and target databases of pipeline tasks by selecting the data source names. For details, see Data Connection Configuration.

    Step Two: Database Environment Preparation

    Grant the account configured in the data connection used in the pipeline task the necessary permission to perform the required operations on the database. For details, see Overview of Database Environment Preparation.

    Step Three: Pipeline Task Environment Preparation

    Deploy Kafka (an open-source event streaming platform) as the middleware. For details, see Kafka Deployment and Transmission Queue Configuration.

    Step Four: Pipeline Task Permission Assignment

    Grant the permission to use Data Pipeline to users who are not super admins. For details, see Pipeline Task Management Permission.

    2. Click Data Pipeline and create a pipeline task, as shown in the following figure.

    2025-01-24_09-38-29.png

    Procedure

    2025-01-24_09-44-30 copy.png

    Source Selection

    1. For details about source databases supported by pipeline tasks, see Types of Data Sources Supported by Data Pipeline.

    iconNote:
    1. In V4.0.29 and later versions, fields with the following data types from the Oracle database are automatically blocked during synchronization: BLOB, CLOB, NCLOB, LONG, RAW, LONG RAW, and BFILE.

    2. In V4.1.7.2 and later versions, the _fdl_update_timestamp and _fdl_marked_deleted fields in source tables will be blocked during real-time data synchronization.

    2. Click Data Source Permission Detection to check whether the account configured in the data connection has permission to read the database log, as shown in the following figure.

    2025-01-24_09-45-03.png

    Read Mode

    It varies according to the source databases.

    Synchronization Type

    iconNote:
    In scenarios with a large volume of existing data, typically the data should be loaded using specific high-speed methods or imported in multiple batchesIn this case, you can select Incremental Synchronization Only in Pipeline Task to synchronize incremental data continuously after all existing data has been synchronized.

    Full + Incremental Synchronization

    All inventory data is synchronized first, and the changes are continuously synchronized. When the task runs for the first time, it performs full synchronization followed by incremental synchronization. If the task is interrupted or paused and then restarted, it will resume from the breakpoint of incremental synchronization and continue incremental synchronization (provided that the full synchronization for all tables has been completed).

    Incremental Sync Only

    2025-01-24_09-45-50.png

    Incremental Sync Start Point
    Description
    If you select Task Startup Time as Incremental Sync Start Point, the task startup time will be used as the parsing start time.

    When you import historical data using the recommended approach to the target data source without any filtering conditions, the entire historical data will be imported, and you can set the incremental synchronization start point to the task execution start time.

    1. The task only synchronizes incremental data. When it runs for the first time, it synchronizes the incremental data since the set start time.

    2. Supported data sources include MySQL, Oracle, SQL Server, and PostgreSQL

    3. After configuration, the incremental synchronization start time is in the yyyy-MM-dd HH:mm:ss.000 format with millisecond-level precision, based on the database time zone.

    iconNote:
    For the PostgreSQL and SAP HANA data sources, only Task Startup Time can be selected as Incremental Sync Start Point.

     If you select Custom Time as Incremental Sync Start Pointyou must specify the incremental start time, which is required and defaults to blank. You can specify the time with second-level precision.

    The earliest available time is the first recorded timestamp in the database logs.

    When you import historical data using the recommended approach to the target data source with the time-based filtering condition, you can set the incremental synchronization start point to the earliest selectable time in the filter.

    Synchronization Object

    You can select the tables or databases to be synchronized in real time.

    Quick Selection

    Quick Selection enables batch selection of multiple tables, streamlining the process of selecting source tables, as shown in the following figure.

    2025-01-24_09-46-35.png

    Ordinary Table

    Scenario: You want to synchronize data from one table to another in real time.

    You need to find the table to be synchronized from Existing Table.

    Group Table

    Scenario: In device data collection, you want to synchronize data in real time from multiple source tables (from different databases) with the same structure to a single table in the data warehouse.

    1. You can synchronize data from multiple source tables with the same structure to a single target table (namely a group table), using the intersection of all source table fields as the fields for the group table, as shown in the following figure.

    iconNote:
    1. Multiple group tables can be generated.

    2. When identifying the field intersection, the system only requires matching field names. If the data type of a field in the group table differs across subtables, the system will select a type compatible with all data types of this field as the data type.

    Scenario one: If the fields in the subtables have a complete intersection, the group table will be named after the first subtable you selected.

    Scenario two: If no intersection fields are found, the error message "The selected subtables have no field intersection and a group table cannot be generated. You are advised to cancel the operation and check the subtable structure." will be displayed, as shown in the following figure.

    Group tables cannot be generated in this scenario.

    2025-01-24_09-57-30.png

    Scenario three: If some fields are beyond the intersection, the warning message "The existing subtable fields beyond the intersection will not be synchronized. You are advised to cancel the operation and check the subtable structure." will be displayed. The name of the generated group table is the name of the first subtable you selected, as shown in the following figure.

    In this scenario, the generated group table only contains the intersection fields, and normal real-time synchronization can be realized.

    2025-01-24_10-01-13.png

    2. The following figure shows the Synchronization Object area after a group table is generated.

    iconNote:
    If only group tables are displayed, ordinary tables still exist but are grayed out with a message saying "No ordinary table."

    2025-01-24_10-03-03.png

    You can rename the group tables, but duplicate names are not allowed. Once the task is started, you cannot rename the group tables. You can delete the group tables. Once a group table is deleted, the subtables will return to ordinary tables.

    2025-01-24_10-05-12.png

    Description one: When group tables and ordinary tables coexist, ordinary tables can be moved to group tables, as shown in the following figure.

    2025-01-24_10-09-50.png

    Description two: For existing group tables, you can select subtables to return them to Ordinary Table or Existing Table, as shown in the following figure.

    2025-01-24_10-12-30.png

    Both the above scenarios will trigger intersection field validation. For details, see the following table.

    Scenario
    Description

    The intersection fields remain unchanged, and no fields exist beyond the intersection in the subtables.

    The group table update will be successful.

    The intersection fields remain unchanged, but fields beyond the intersection exist in the subtables.

    The warning message "The existing subtable fields beyond the intersection will not be synchronized. You are advised to cancel the operation and check the subtable structure." will be displayed. If you click Continue Generation, the generated group table only contains the intersection fields, and normal real-time synchronization can be realized.

    The intersection fields have been changed (added, removed, or changed in type), the pipeline task has not yet been run, and the warning message "This operation will change the original group table field. You are advised to cancel the operation and check the subtable structure." is displayed.

    2025-01-24_10-16-47.png

    If fields beyond the intersection exist in abnormal subtables, clicking Continue Generation will generate a group table with a yellow exclamation mark. The following figure shows the warning message.

    2025-01-24_10-14-42.png

    If no such fields exist, the group table will be generated without any warnings.

    The intersection fields have been changed (added, removed, or changed in type) and reduced to zero, and the pipeline task has not yet been run.

    The group table update will fail.

    The intersection fields have been changed (added, removed, or changed in type), and the pipeline task has been run.

    The group table update will fail, and the message "This operation will change the original group table field. The task has been run and the operation is not allowed. You are advised to check the subtable structure." will be displayed.

    Subsequent Operations

    附件列表


    主题: Data Pipeline
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    10s後關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy