Basic functions of left and right merge- FineBI Help Document FineBI Document

Recent Updates: April 15, 2022

1. Overview

1.1 Version

FineBI version	JAR package version	function changes
5.1	-	-
5.1.5	2020-09-02	left and right merge choose left merge by default
5.1.13	-	direct: New support for left and right merge for data tables in "Hadoop Hive database" with Excel The system actively interrupts the update when there is too much data in the "union combined"

1.2 Application scenario

In the process of using data in practice, there are often situations where two tables need to be joined together to form a new table.

Suppose there are two such data tables.

① Table A：record the name and English grade of the students.

② Table B：record the name and math grade of the students.

If you want to see the student's name, math grade and English grade in one table, you can use the "Name" field as the "merge basis" for left and right merge.

The different merge methods are shown below.

Now you have two tables below,

image (1).png

The effect of "union combined".

image (2).png

The effect of "intersection".

image (3).png

The effect of "left merge".

image (4).png

The effect of "right merge".

image (5).png

1.3 About direct connection

Direct connection table and Excel left and right merge

In the Direct Connect version, there are four databases "Microsoft SQL Server 2016, Oracle, Pivotal Greenplum Database, Hadoop Hive" that support merging data tables with Excel, while the rest of the databases do not.

Before merging, you need to change the field value of SystemOptimizationConfig.excelExtractDataBase in FINE_CONF_ENTITY table to "true", please refer to FINE_CONF_ENTITY Visual Configuration for details.

In the directly connected version, the two tables for left and right merging cannot be from two data sources.

Otherwise, an error is reported after merging:[DIRECT-ETL] unsupported data source: databases on different hosts/ports

2. Example

For example, in the estate industry package, there are two data tables: the contract fact table (FACT_SALE_CONTRACT) and the contract dimension table (DIM_CONTRACT), which need to be joined together to form a new table based on the contract ID field and ensure that the data in the contract fact table will not be lost. This can be achieved by merging left and right.

2.1 Create a self-service dataset

1）Go to "Data Preparation > Package" and click "Add Table > Self-Service Dataset", as shown below.

image (6).png

2）Enter the self-service dataset configuration interface, select the fields to be added to the final table from "FACT_SALE_CONTRACT" of the "estate industry" package.

Note: Only data tables that have been added and updated with data can be selected here; data tables that have not been updated cannot be used to create self-service data sets.

Note: Users are recommended to reduce the length of table field names before performing left-right merging to avoid the problem of saving self-service datasets caused by too-long field names.

image (7).png

2.2 Select the data table to be merged

After adding the fields, click "+" in the left column and select "Left and right merge", as shown below.

image (8).png

Select the table to be merged as "DIM_CONTRACT", and select the fields to be added, click "OK", as shown in the following figure.

image (9).png

2.3 Add merge basis

The meaning of the merge basis as described in the "Name" field in Section 1.2.

1）When you enter the left and right merge settings screen, you need to select the "left merge" method to ensure the data integrity of the contract fact table.

At this time, a field common to both tables will automatically appear in the merge basis field as "merge basis", here is the "CONTRACTGUID" contract ID field. The following figure shows.

Note: The fields in the merged basis of the two tables added here must be of the same type, and field types that are inconsistent cannot be merged left or right and require field type conversion.

You can also manually change the merge basis field or add multiple merge basis. If you add multiple merge basis, you can

image (10).png

Description of the consolidation method:

merge method	description
left merge	Corresponds to the left join in the SQL statement
right merge	Corresponds to the right join in the SQL statement
intersection	Corresponds to the inner join in the SQL statement
union combined	Corresponds to the full join in the SQL statement

2）After adding the merge basis, a preview of the new merged table will appear in the data preview box below, as shown in the following figure.

Extract data version: The merge basis will be merged into one field and only the merge result "CONTRACTUID" will be kept

image (11).png

Directly connected data version: the merge basis fields of the left and right merge two tables are still two fields, as shown in the following figure.

image (12).png

3）If you want to modify the tables/fields to be merged that have been added previously, you can hover your mouse over the table column to be merged and click the Edit button to reselect the tables and fields. As shown in the figure below.

image (13).png

2.4 Effect view

After clicking the "Save/Save As" button in the upper right corner and updating the data, you see the processed self-service dataset in the package, as shown below.

image (14).png

3. Multiple merge basis

Implement "CONTRACTGUID" and "Signing Time" into one table, as shown below.

image (15).png image (16).png

image (17).png

It is not possible to use a single field as the basis for merging, so the fields "CONTRACTGUID, Signing Time" are required as the basis for merging. The meaning of multiple merge basis is to ensure the uniqueness of data when merging left and right, as shown in the following figure.

image (18).png

4. Cautions

4.1 Failure to update after merging using merge sets

Since the merge is equivalent to N:N, it generates a Cartesian product, which leads to data volume expansion. In order not to affect the stability of the system, the system actively interrupts the update of the self-service dataset that has undergone N:N operations when the amount of data after the merge satisfies the following conditions.

Performing union combined merger with data volumes exceeding 10 million.
Expansion index >= 5; Expansion index: MIN (and aggregate and result data volume/left table data, and aggregate and result data volume/right table data). For example, if 1000 thousand data and 5000 thousand data N:N results in 2000w data, then the N:N expansion factor is: 20000 thousand/5000 thousand = 4

If the Cartesian product appears, the user is prompted at the update message whether the update was successful.

Data set generation is successful: Cartesian product is detected when updating, data expansion, and update time increase, please check if there are duplicate values in the data of the merge basis column in the left and right merge steps.
Data set generation failed: The cartesian product was detected during the update, and the operation was aborted due to severe data inflation. Please check if there are duplicate values in the data of the merge base column in the left and right merge steps, and please make sure that there should not be too many duplicate values.

If the "Update Information" indicates that the generation has failed, but the user still wants to use N:N for analysis, you can reduce the amount of data by filtering and other means to meet the requirement that the system does not interrupt.

image (19).png

4.2 Data field requirements

Please make sure there are no spaces in the fields of the merged data table before doing the left and right merge. The following figure shows.

image (20).png image (21).png

4.3 Set the number of calculation data in the preview board

If the self-service dataset is created by the data analysis user or by the data processing user in the self-service dataset.

When the amount of data is large, the calculation can be made smoother by using the first 5000 rows in the editing process of the dataset, as shown in the following figure.

image (22).png

Already the First

Already the Last

Helpful
Not helpful
Only read

售前咨询电话

400-811-8890转1

在线技术支持

在线QQ：800049425

热线电话：400-811-8890转2

总裁办24H投诉

热线电话：173-1278-1526

文档反馈

鼠标选中内容，快速反馈问题

鼠标选中存在疑惑的内容，即可快速反馈问题，我们将会跟进处理。

不再提示

10s后关闭

Basic functions of left and right merge