Single Pipeline Task O&M

  • Last update: October 28, 2024
  • Overview

    Version Description

    FineDataLink Version

    Functional Change

    4.0.5

    /

    4.0.7

    Optimized the user interface of the pipeline task list.

    4.0.27

    • Optimized the prompts of pipeline tasks.

    • Allowed selecting data tables to be synchronized in batches quickly.

    • Allowed managing pipeline tasks based on folders.

    • Allowed skipping, retrying, and resynchronizing dirty data in single tables and multiple tables.

    • Optimized the prompts of task logs.

    4.0.29

    Allowed copying and pasting tasks to a specified folder.

    4.1.2

    • Allowed customizing historical statistics to view the synchronization status. For details, see the section "Viewing Real-Time Synchronization Status from the Time Dimension."

    • Allowed viewing the configuration status of running tasks. For details, see the chapter "Task List."

    • Displayed the delay time of writing the pending data. Allowed viewing Time of Reading Message and Time of Writing Message. For details, see the section "Real-Time Statistics."

    • Optimized the display methods of dirty data. For details, see the section "Viewing Synchronization Status from the Data Table Dimension."

    • Optimized the processing methods of dirty data. For details, see the section "Processing Dirty Data."

    4.1.5.3

    • Prompted the user on the page if there is no running log currently.

    • Optimized the filtering button of Running Log.

    • Marked the Level column of Running Log with different colors. Optimized the Classification column.

    4.1.6.3

    Prohibited multiple users from editing the same pipeline task simultaneously.

    4.1.6.5

    • Allowed reordering pipeline tasks/folders with drag-and-drop operations.

    • Added the Move to option to the  button next to each task.

    4.1.8.1

    1. Divided Sync Object into Group Table and Ordinary Table on the Real-Time Statistics and Historical Statistics tab pages.

    2. Displayed the logs of the sub-tables of the group table on the Running Log tab page.

    Function Description

    After configuring pipeline tasks on FineDataLink platform, you can monitor real-time data synchronization, view the task running status, as well as view and process dirty data quickly.

    Task List

    You can view and edit all pipeline tasks within your permission scope in the task list.

    Pipeline Task Display Format/Location Description

    1. Pipeline tasks can be placed in folders. Click a folder to see all pipeline tasks within it. You can adjust the display format of pipeline tasks (card style or list style).

    2. You can move a pipeline task to a specified folder, as shown in the following figure.

    Renaming/Moving/Copying and Pasting/Exporting/Deleting Pipeline Tasks

    Click the  button on the right side of a pipeline task to modify the name, move or copy and paste the task to a specified path, as well as delete or export the task, as shown in the following figure.

    1. In FineDataLink 4.1.6.5 and later versions, you can drag and drop pipeline tasks/folders to change their positions.

    • All users can manually drag and drop pipeline tasks/folders on which they have permission.

    • The new order works on all users' pages.

    • For example, a user sees top-down pipeline tasks 5, 1, and 7, and an admin sees top-down pipeline tasks 5, 6, 1, 9, and 7. If the user places the pipeline task 7 between pipeline tasks 5 and 1, then the pipeline task 7 is shown between pipeline tasks 6 and 1 on the admin’s page, and the pipeline task 7 is placed near the lower target node. (In this scenario, pipeline tasks 5 and 1 are target nodes, and the pipeline task 1 is the lower one.) The same rule applies to folders.

    2. In FineDataLink 4.1.6.5 and later versions, the Move to button is added.

    You can move a task to another folder by clicking the Move to button. After being moved to another folder, the task will be located at the end of the new folder. Moving a task to its original folder is not allowed. The new task location works on all users' pages.

    Pausing/Editing Pipeline Tasks

    Click a pipeline task in the task list to see whether the task is running. You can also stop the task manually or enter the task editing page, as shown in the following figure.


    Display Content of Task Running Status

    Operation status

    Last start time of the task

    Source data read time 

    Target data read time

    Source data type

    Target data type

    Starting/Pausing the task

    Editing the task

    For pipeline tasks in the running, paused, aborted, draft, or to be started status, you can click the Edit button to modify or view the configuration items. The content that can be modified is shown in the following table.

    iconNote:

    If a task is paused, you can add or remove group tables, as well as add or remove tables within existing group tables.

    Function


    Running

    Paused/Aborted

    Draft

    To Be Started

    Source Selection (Configuration Page)

    Data Source Type

    View

    View

    View and Edit

    View and Edit

    Data Source and Data Connection

    View

    View and Edit

    View and Edit

    Data Source Permission Detection

    Use

    Use

    Use

    Read Mode

    View

    View and Edit

    View and Edit

    Synchronization Type

    Switching between full and incremental synchronization is not allowed.

    You can modify Incremental Sync Start Point if you select Incremental Sync Only.

    Sync Object

    Adding or removing sync objects is allowed.

    1. Adding sync objects: The added table will be synchronized according to the selected synchronization type.

    • For Full + Incremental Synchronization, the added tables require full synchronization. Incremental synchronization will be suspended in the background until the full sync is finished.

    • For Incremental Sync Only:

    • If Incremental Sync Start Point is modified, all tables (including the added tables) will be synchronized according to the specified start point. 

    • If Incremental Sync Start Point is not modified, the added table will be synchronized according to the built-in breakpoint of the task.

    2. Removing sync objects: If you remove a sync object and save the modification, all related information of this object will also be removed. The corresponding table will not be synchronized when the task is started.

    Target Selection (Configuration Page)

    Data Destination Type

    View

    Data Destination and Data Connection

    Pattern

    Data Deletion Strategy

    Mark Timestamp During Synchronization

    Synchronize Source Table Structure Change (DDL)

    View and Edit

    You can adjust the DDL synchronization status.

    Table Field Mapping (Configuration Page)

    Field Mapping

    The added tables can be configured with field mappings.

    You can only view field mappings of tables already in synchronization, but cannot edit or adjust them.

    Pipeline Control (Configuration Page)

    Dirty Data Threshold

    View and Edit

    Retry After Failure

    View and Edit

    Result Notification

    View and Edit

    Statistics Log

    Viewing Real-Time Synchronization Status from the Time Dimension

    Click Activity Management to view the historical synchronization trends of the current task.

    iconNote:
    1. On Real-Time Statistics and Historical Statistics tab pages, Sync Object is divided into Group Table and Ordinary Table. Group tables can be expanded to show the sub-tables of the group.

    2. The Current Sync Phase column description: If all tables in the group are in incremental synchronization, the group table is in incremental synchronization. As long as there is a table in the group in full synchronization, the group table is in full synchronization


    Real-Time Statistics

    Click Real-Time Statistics to view the historical total reads of the task. Namely, the total amount of data read and output since the task first started, the amount of pending data, and the real-time rates of reading and outputting data.

    iconNote:

    If resynchronization is used, the reading and writing of a single table will be recalculated. The resynchronized data volume will not be accumulated to the total read and output volume of Real-Time Statistics, but will be included in Historical Statistics.

    The details are as follows:

    1. Total Read = Pending Data + Total Output + Amount of Dirty Data

    You can view the delay time of writing all tasks or a single data table in the Pending Data module.

    Write Delay Time = Message Read Time - Message Write Time

    iconNote:

    The amounts of dirty data, pending data, as well as data read and output of the group table are the sum of the corresponding indicators in each group:

    For example, group table A contains sub-tables A1 and A2:

    Dirty data amount of group table A: (from sub-table A1, the amount of dirty data when writing to table A fails) + (from sub-table A2, the amount of dirty data when writing to table A fails)

    Pending data of group table A: Pending data from sub-table A1 + Pending data from sub-table A2

    The amount of data read in group table A: The amount of data read (addition, deletion, and modification) from sub-table A1 + The amount of data read (addition, deletion, and modification) from sub-table A2

    The amount of data output in group table A: The amount of data write (addition, deletion, and modification) from sub-table A1 + The amount of data write (addition, deletion, and modification) from sub-table A2

    2. Time of Reading Message and Time of Writing Message can be viewed by hovering the cursor over the  icon, as shown in the following figure.

    Historical Statistics

    In 4.1.2 and above versions, you can click Historical Statistics to select and view the real-time synchronization status of Last 2 Hours, Last 24 Hours, Last 3 Days, Last 7 Days, and Last 15 Days.

    Total Read, Pending Data, Total Output, and the real-time rate of reading and outputting data (rows/s) are displayed, as shown in the following figure.

    You can also customize the time interval to view the data synchronization status of the selected interval, as shown in the following figure.

    Viewing Synchronization Status from the Data Table Dimension

    You can view the synchronization status of all source tables in the Sync Object module, as shown in the following figure.


    Function

    Description

    1

    Source Data Table

    Displays the name of the data table

    2

    Time of Reading and Writing the Data Table

    Displayed on the Real-Time Statistics tab page

    Time of Reading Message

    Time of Writing Message

    3

    Sync Status

    Displays the current synchronization status of the table: incremental synchronization or full synchronization

    4

    Dirty Data

    • Displays the number of dirty data

    • Click X Row(s) of Dirty Data to view the details of dirty data according to the error type, time, and keywords.

    • Click Export to export dirty data.

    iconNote:

    If the primary key update of data from value A to value B fails, there will be two rows of dirty data: data with primary key A will be deleted, and data with primary key B will be updated or inserted.

    The data will be exported according to the structure of the target table as configured in the field mapping.

    5

    Pending Data

    Displays the amount of pending data

    6

    Read and Write Statistics

    Displays details about the number of inserted, updated, and deleted data

    Displays the read/write speed

    iconNote:

    Read and Write Statistics can only be viewed on the Historical Statistics tab page.

    7

    View Log

    Displays the task running log of a single table

    8

    Resync

    Full resynchronization of single tables or the entire pipeline task is supported. Full synchronization will be re-executed after the target table is cleared, and incremental synchronization will be performed after full synchronization is completed.

    If the task is fully resynchronized, the log statistics will be reset (including input rows and output rows).

    iconNote:

    If Logical Deletion is turned on, the target table will be cleared and rewritten during resynchronization. The rewriting uses the insert logic. A prompt will appear: "Data of logical deletion generated during task operation will be cleared."

    Click Dirty Data to filter the source table with dirty data. Click a specified data table to view the error details, which will be useful for subsequent processing. Error Type and Fault Cause are shown in the following table.

    Error Type

    Fault Cause

    Fault Details

    Unmatched Filed Type

    The data type of the <Column_name> field is unmatched.

    The expected data type of one or more <Column_name> field(s) does not match the actual data type, or is different from the data type received before. Update the field type of the target table.

    For example, a string value is found in a Boolean field, or a null value is found in a non-nullable field.

    Field Length Exceeded

    The length of the data exceeds the length of the <Column_name> field.

    The data size is larger than the size of the <Column_name> field. Update the field type of the target table.

    For example, a string with a length of 1000 is found in the VARCHAR(255) field.

    Missing Target Field

    The <Column_name> field does not exist. Create related fields first.

    One or more <Column_name> field(s) specified in the field mapping do not exist in the target table. Create related fields first.

    Missing Target Table

    The <Table_name> target table is missing.

    The <Table_name> target table specified in the field mapping does not exist. Create related tables first.

    Invalid Data Destination

    Failed to connect the <Data_connection_name> data destination.

    Failed to connect the <Data_connection_name> data destination. Check whether the network is connected, the account password is correct, and the account permission is assigned.

    Missing Write Permission

    No write permission is assigned on the <Table_name> target table.

    No write permission is assigned on the <Table_name> target table. Adjust related permission and try again.

    An error occurred during the full synchronization

    An error occurred during the full synchronization

    Error details

    Other Exception

    Other

    Stack details

    Processing Dirty Data

    iconNote:

    Instruction on details display and handling of dirty data in group tables: Processing group tables integrally is currently not allowed. The processing can only be performed on each table within the group.

    Processing a Single Piece of Dirty Data

    To process a single piece of dirty data, select the dirty data of a specified data table, and click Retry or Ignore, as shown in the following figure.

    Processing Dirty Data in Batches

    To process all dirty data in the current task or dirty data of a specified table in batches, tick a single table or multiple tables of the specified source database to skip, retry, or resynchronize the dirty data, as shown in the following figure.

    Processing Method

    Description

    Skip Dirty Data

    For a single table and specified multiple tables, if you click Skip Dirty Data, the cached dirty data will be deleted and cannot be retrieved. 

    Meanwhile, these data rows will be removed from the dirty data rows in log statistics.

    Retry Dirty Data

    For a single table and specified multiple tables, if you click Retry Dirty Data, the cached dirty data will be resubmitted and the data volume statistics will be updated.

    iconNote:

    Retry Dirty Data is not supported for dirty data generated in the full volume synchronization phase.

    Resync

    Full resynchronization of single tables or the entire pipeline task is supported. Full synchronization will be re-executed after the target table is cleared, and incremental synchronization will be performed after full synchronization is completed.

    If the task is fully resynchronized, the log statistics will be reset (including input rows and output rows).

    iconNote:
    1. If Logical Deletion is turned on, the target table will be cleared and rewritten during resynchronization. The rewriting uses the insert logic. A prompt will appear: "Data of logical deletion generated during task operation will be cleared."

    2. Retry Dirty Data is not supported for dirty data generated in the full volume synchronization phase.


    iconNote:

    1. During the full volume synchronization phase, retrying, skipping, or resynchronizing dirty data of the pipeline task is not supported.

    To process the dirty data generated in the full volume synchronization phase, you can export the details and manually adjust the existing records or insert new records, as shown in the following figure.

    iconNote:

    2. If the data source is Kafka, resynchronization is not supported. To resynchronize expired dirty data, you can only rerun the entire table or manually insert dirty data and ignore it in the pipeline.

    3. If the task is paused or aborted, the logic will be executed during the next startup after resynchronization.

    Recording logic of event-based dirty data:

    • The table with primary key A generates dirty data at time point t1. If the writing operations to primary key A after t1 are successful, the historical dirty data of primary key A will be cleared. If the writing fails, only the latest dirty data of primary key A will be retained.

    • At time point t1, if the primary key A is updated to primary key B, the pipeline task should be disassembled into two events and processed sequentially to the target end: deleting primary key A and inserting primary key B.

    Note:

    Constraints of batch loading mode at the output end

    If the output end is in the batch loading mode, a large batch containing multiple pieces of data is generally used to submit data to the target end at one time. Such data sources may not be able to identify which data is dirty during the process. The specific analysis of the existing target ends that support batch loading is as follows:

    1. GaussDB 200 supports two writing modes, copy and parallel loading:

    The copy mode:

    • In the full synchronization phase, if a single batch fails to be submitted, the data of this batch will be changed to call the JDBC API for writing (A single batch includes 1024 records currently.). If JDBC is used for writing, specific error information can be obtained, and the details of dirty data can be displayed, but the performance is poor.

    • In the incremental synchronization phase, if a single batch fails to be submitted, the data of the entire batch will be recorded as dirty data (The maximum size of a single batch is 5 MB currently, which is approximately within 10,000 rows. The specific number of rows depends on the size of each piece of data).

    Parallel loading:

    • In parallel loading, all the required data in a single task is submitted at one time. Parallel loading provides you with an API to query specific error information. FineDataLink can be used to query details of dirty data.

    2. HDFS writing method of HIVE:

    The data is directly written to the HDFS file at one time. Detailed error information cannot be obtained. Dirty data management is not supported.

    Running Log

    Visualized Log

    1. Click Running Log to see the historical running logs of the current pipeline task, including Time, Level, Classification, and Description, as shown in the following figure.

    2. Logs can also be searched for or filtered, as shown in the following figure.

    iconNote:

    You can filter logs based on tasks, tables, types, and generation time.

    Click the Filter button. The drop-down list of Log Classification is shown in the following figure.

    3. If there is no running log currently, the tab page will be displayed as below.


    Log Level

    Logs are divided into four levels: BASIC INFO, INFO, WARN, and ERROR.

    BASIC INFO is the basic level.

    Click Details next to each log to view the specific description, as shown in the following figure.

    附件列表


    主题: Task O&M
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    10s後關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy