Modified the write method Append/Update/Delete Data Based on Identifier Field to Add/Modify/Delete Data Based on Identifier Field.
Added a Transaction Control function.
Added Table Emptying Method and Strategy for Primary Key Conflict settings when the write method was set to Write Data into Target Table After Emptying It.
Updated the option names of Strategy for Primary Key Conflict when the write method was set to Write Data into Target Table Directly.
This document introduces the configuration items on the Write Method tab page of the Data Synchronization node and the DB Table Output operator in the Data Transformation node.
Write MethodThe following figure shows the Write Method tab page.
The three write methods are described in the following table.
If the target table has no physical primary key and no logical primary key mapping has been configured, data will be written into the target table directly.
Effect Description of Different Write Methods
If the target table has a physical primary key or has been configured with logical primary key mapping, the system compares data rows in the source and target tables based on the primary key value.
1. For data rows with different primary key values, data will be appended to the target table.
2. For data rows with the same primary key value, three writing strategies are available:
Update Data: Overwrite existing rows if the same primary key values exist.
Record as Dirty Data: Record rows as dirty data and exclude them from writing if the same primary key values exist.
Skip Conflicting Data: Skip the writing of rows where duplicate values of the primary key are detected.
Write Data into Target Table after Emptying it
1. Two table emptying methods are available:
TRUNCATE: It empties all data in the table at high speed and generates no deletion audit records.
DELETE: It performs row-by-row data deletion at moderate speed and generates deletion audit records.
2. You can configure the strategy for primary key conflicts to prevent table locking.
Add/Modify/Delete Data Based on Identifier Field
1. If the target table has no physical primary key, you must set the logical primary key in Primary Key Mapping if Modify or Delete is selected in Write Method.
2. If the target table has a physical primary key or has been configured with logical primary key mapping, the identifier value of each data row will be recognized.
Add: Add data that only exists in the source table to the target table.
Update: Update the target table data with the modified source table data.
Delete: Delete data that only exists in the target table.
Transaction Control
When data is written to the database through a scheduled task, rollback in the event of failure is supported. That is, the target table remains unchanged if the task fails.
Pre-processing and post-processing are supported when data is written to the database through a scheduled task. This processing logic is encapsulated along with core write operations as transactions to support rollback in the event of failure.
Primary Key Mapping
Establish the correspondence between the rows of the source table and the target table, as shown in the following figure.
1. If the target table has a physical primary key or you have set the primary key for the target table to be created, the primary key mapping relationship will be automatically generated according to the physical primary key of the target table to ensure data uniqueness.
2. If the target table has no physical primary key:
If Target Table is set to Auto Created Table, you can manually set one or more fields as the physical primary key in Field Mapping.
If Target Table is set to Existing Table, you can select the field as the logical primary key in Primary Key Mapping for mapping to ensure data uniqueness. If no field is selected, data will be written into the target table directly.
Strategy for Primary Key Conflict
You need to configure Strategy for Primary Key Conflict if any of the following conditions are met:
The target table has a physical primary key.
The target table has no physical primary key but has been configured with logical primary key mapping.
Three strategies for the primary key conflict are available:
Versions Prior to V4.2.7.4
Data is written into the target table after the target table is emptied. This write method is available with or without a primary key.
V4.2.7.4 and Later Versions
Setting Item
Description
Two table emptying methods are available:
1. TRUNCATE:
Advantage: This method empties all data in the table at high speed.
Disadvantage: The Transaction Control function is unavailable. (For details, see the "Transaction Control" section of this document.) If another system performs read/write operations on this table simultaneously, the database will lock the table.
2. DELETE:
Disadvantage: This method deletes table data row by row, with generally slower performance.
Advantage: The Transaction Control function is available. For details, see the "Transaction Control" section of this document.
Application scenario: You want to minimize the time when the target table is empty and to prevent a scenario where the table emptying operation succeeds but the subsequent data writing fails.
If the target table has a physical primary key, Primary Key Mapping will be set to the mapping row of the physical primary key configured in Field Mapping automatically, which cannot be modified.
If the target table does not have a physical primary key, you must select a mapping row from Field Mapping to implement logical primary key mapping.
If the source table has no primary key but the target table does, primary key conflicts will still occur during data writing, so you must select a strategy for primary key conflicts.
Function Description
This write method is not supported when the target data source is a Hive database.
You can select one or more options among Add, Modify, and Delete.
Write Method:
When you select Delete in Write Method, you can choose between Physical Deletion and Logical Deletion:
Logical Deletion: It marks data as deleted without actually deleting it.
Physical Deletion: It actually deletes data.
For details about the specific effects of the two deletion methods, see Differences Between Physical Deletion and Logical Deletion.
Identifier Field and Identifier Value
1. You can filter the data you need to add/modify/delete based on the identifier field and its value.
2. If you have used the Data Comparison operator, Identifier Field defaults to fdl_comparison_type, and Identifier Value defaults to the fdl_comparison_type value.
3. If only one option among Add, Modify, and Delete is selected, you can leave Identifier Field empty, and the selected write method will take effect on all the read data.
For details, see the "Writing Data into the Target Table Directly" section of this document.
1. Primary Key Mapping is displayed when Modify or Delete is selected in Write Method.
2. When Modify or Delete is selected in Write Method, Primary Key Mapping must be configured.
Best Practice
1. The Data Comparison operator is usually used in combination with the DB Table Output operator. For details, see Data Comparison Example.
2. The application scenario of a single Data Synchronization node includes:
You want to add/modify/delete data based on the filtering result.
You want to add/modify/delete data based on the identifier field and its value.
For details, see Adding/Modifying/Deleting Data Based on the Identifier Field.
1. You want all data fetched from the current source to be loaded within one atomic transaction during data writing to the target database in a single execution of scheduled synchronization.
Success scenario: All data is successfully written. The transaction is committed, updating the target table.
Failure scenario: If any error occurs, the entire transaction rolls back, leaving the target table completely unchanged.
2. After the scheduled synchronization task completes, statistical metrics of that write operation can be written to a custom table for subsequent data validation purposes.
1. Starting from V4.2.7.4, the Transaction Control function is available when the target end is Oracle, Greenplum, MySQL, SQL Server, or PostgreSQL databases. Starting from V4.2.7.4, this function is also available when the source end is KingBaseES (SQL Server mode) or KingBaseES (MySQL mode) databases.
2. If you set Write Method to Write Data into Target Table After Emptying It:
TRUNCATE: Transaction Control is available for PostgreSQL and Greenplum databases only.
DELETE: Transaction Control is available for all compatible databases listed above.
Pre-processing and post-processing are supported when data is written to the database through a scheduled task. This processing logic is encapsulated along with core write operations as transactions to support rollback in the event of failure. The setting page is shown in the following figure.
Setting items of Transaction Control are shown as follows.
If Parallel Read is selected for the source end and Transaction Control is enabled for the target end, parallel data fetching will be performed automatically.
For an example of the Transaction Control function, see Transaction Control Example.
1. Application scenario: You can use it to delete data within a specific time range from the target table before performing the write operation.
2. You can use @temp_tbl_nm_full to reference the intermediate table to be used during loading. You can use this parameter to define SQL queries. When this table name parameter is referenced in other transactions, its value will be an empty string.
1. Application scenario: You can write post-execution SQL statements for post-loading validation to check for anomalies in final results, recording incremental timestamps, calculating the total or new record count, and logging ETL completion time.
If Transaction Control is enabled or the target end is a GBase 8a database, only the following strategies for primary key conflicts are available:
Throw an Error: Throw a node error and exclude conflicting rows from writing if the same primary key values exist.
The following table lists supported parameters.
Assume that null values exist in the D column of both the source table and the target table.
You have set Write Method to Write Data into Target Table Directly, selected fields A, B, C, and D as logical primary keys, and set Strategy for Primary Key Conflict to Overwrite Data in Target Table If Same Primary Key Value Exists, as shown in the following figure.
After execution, for primary key conflicts with not-null values, the source data overwrites the target data. (See the first record in the following figure.) For primary key conflicts with null values, the source data is appended to the target table without overwriting the existing data.
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy