When process data, you often need to join data tables. This document describes various join methods available in FineDataLink.
LEFT JOIN and LEFT OUTER JOIN return all records from the left table, along with records from the right table where the values of join fields are equal.
Matching is performed based on conditions, using the left table as the reference. If a match is found, the corresponding data in the right table will be output normally. If no match is found, data of the right‑table fields will be replaced with Null. Additionally, if the right table contains duplicate matching values, multiple rows will be returned.
In FineDataLink, you can use the Data Association operator to implement this.
The example tables are as follows:
You can use the Data Association operator to perform a left join, as shown in the following figure.
The effect is shown in the following figure.
The configuration of the above operator is equivalent to the following statement in the Spark SQL operator:
select * from DB Table Input-1left join DB Table Input1-1on DB Table Input-1.`Name` = DB Table Input1-1.`Name`
or
select * from DB Table Input-1left outer join DB Table Input1-1on DB Table Input-1.`Name` = DB Table Input1-1.`Name`
LEFT SEMI JOIN:
Only columns from the left table are displayed. If rows in the right table match the join condition, the left table rows will be output; otherwise, they are filtered out. Even if the right table contains duplicate matching values, only one row is output for each left table row.
For example, for the two data tables mentioned above, if you use LEFT SEMI JOIN:
Only the results that match in both the left and right tables will be displayed, and duplicate values from the right table will be removed. The results are as follows:
It can be implemented following the same procedure as a left join via Data Association.
UNION removes completely duplicate data when performing vertical unions. If you do not need to remove duplicate rows, you are advised to use UNION ALL instead, as it offers higher processing efficiency.
Example:
select* from ( select 2 as id_no,99 as score union select 2 ,95 ) Student_Score
UNION ALL takes the union of the queried datasets without removing duplicate rows.
Taking the performance of UNION ALL on the following data tables as an example:
1. Drag in two DB Table Input operators and use a Spark SQL operator to obtain all data from tables a1 and a2 and perform a union all operation. The SQL statement is as follows:
select * from a1union allselect * from a2
The following figure shows the effect.
The result is shown in the following figure.
If UNION is used, the duplicate data in the tables will be removed, with only one record retained. The results are shown in the following figure.
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy