Two data tables are stored in different databases. You may want to quickly combine data from the two tables to generate a new table.
The Data Association operator enables cross-database table joins.
Four join methods are supported, including:
1. Left Join: It returns all records from the left table and the matched records from the right table. If there is no match, the result is filled with NULL for columns from the right table.
2. Right Join: It returns all records from the right table and the matched records from the left table. If there is no match, the result is filled with NULL for columns from the left table.
3. Inner Join: It returns only the rows with matching values in both tables.
4. Full Outer Join: It returns rows when there is a match in one of the tables. If there is no match, the result is NULL on the side that does not have a match.
The Data Association operator enables cross-database table joins. You can prepare two or more tables for joins.
This document takes combining data from two tables as an example.
You can download example tables and upload them to the corresponding databases: Math_Score.xls English_Score.xls1. Data in the English_Score table is shown in the following figure.
2. Data in the Math_Score table is shown in the following figure.
Now you want to combine data from the English_Score table (as the left table) and the Math_Score table using a left join.
Procedure
1. Create a scheduled task, drag a Data Transformation node onto the page, and enter the Data Transformation editing page.
2. Drag a DB Table Input operator onto the page and configure it to read data from the English_Score table, as shown in the following figure.
3. Add remarks to this operator to indicate from which table the data is read, facilitating subsequent task O&M and viewing, as shown in the following figure.
4. Drag another DB Table Input operator onto the page, configure it to read data from the Math_Score table, and add remarks similarly, as shown in the following figure.
Notes
The DB Table Input operator extracts the first 5,000 rows by default for subsequent calculations and data preview. Joins are performed on these rows in the succeeding Data Association operator, which may lead to inaccurate preview results.
If you require accurate preview results in Data Association, you can choose DB Table Input > Sample Setting and set Sample Volume to Full Volume.
For details, see Description of Data Volume in Scheduled Tasks.
1. Drag a Data Association operator onto the page and select Left Join as Join Method and Name as Join Field, as shown in the following figure.
Note that:
For two-table joins, both left and right tables are configurable. For joins involving three or more tables, the result dataset of the previous join is fixed as the left table (uneditable) of the next join, while the right table remains configurable.
You can select multiple join fields, with an AND logical relationship between them.
Duplicated field names will be suffixed with an incrementing number (starting from one) automatically, such as Column, Column1, and Column2.
Ensure correct table order when using Left Join and Right Join as the join logic of Table A Left Join Table B is different from that of Table B Left Join Table A.
2. Click Data Preview, as shown in the following figure.
To remove the Name1 column, you can either unmap this column when configuring data output (seeing the "Outputting Data" section) or use a Field Setting operator afterward to delete this column (which can also be used to modify the field type).
1. Drag a DB Table Output operator onto the page, configure the data destination, and unmap the Name1 column, as shown in the following figure.
2. Set Write Method to Write Data into Target Table Directly.
1. Click Run. The Log area after successful task execution is shown in the following figure.
The table data in the database is shown in the following figure.
2. Click the Publish button to publish the task to Production Mode, as shown in the following figure.
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy