反馈已提交

网络繁忙

Join

  • Last update:  2024-04-08
  • Overview

    Version

    FineBI Version
    Functional Change

    V6.0

    Optimized the logic of Join in Permission Inheritance.

    iconNote:

    This logic optimization only applies to extracted self-service datasets that have Permission Inheritance enabled and include Join steps.

    V6.0.5

    Added a prompt for you to check if the associated field has duplicate values.

    Added the Create Merge Calculation function.

    Application Scenario

    When using data, you may need to combine two tables into a new table.

    Assume that the following data tables are available.

    Table A: records student names and their English scores.

    Table B: records student names and their math scores.

    If you want to view the student names, math scores, and English scores in one table, you can perform Join with the Student Name field as Merge Basis.

    The following figure shows different merging modes.

    Function Description

    If you are familiar with SQL, the Join function here is similar to that in SQL.

    If you are more familiar with VLOOKUP and SUMIF in Excel, you are advised to browse the function of adding columns to other tables.

    iconNote:
    If you need to merge the data processed by multiple workflows, use the Data Lineage function in FineDataLink.

    Notes for Direct Connection Versions

    To perform Join on direct-connected data, the two data tables storing the data cannot come from two different data sources, but can in the extraction versions. 

    If you merge the two tables coming from two different data sources, a prompt "[DIRECT-ETL] unsupported data source: databases on different hosts/ports" will be displayed.

    For direct-connected data, only tables in some databases can be merged with Excel tables. For details, see Databases that Support Fusion Analysis with Excel.

    Example

    Merge the fields of BIdemo_Retail_Sales Details and BIdemo_Retail_Brand Dimension tables into one table for analysis.

    Creating an Analysis Subject and Adding Data

    Create an analysis subject and add data. Search sales to find BIdemo_Retail_Sales Details and brand to find BIdemo_Retail_Brand Dimension.

    iconNote:

    You are advised to shorten table field names before performing Join to prevent saving failure caused by excessively long field names.

    Setting Join

    1. You can perform Join through either of the two entries shown in the following figure.


    2. Select the table BIdemo_Retail_Brand Dimension to be joined and tick all fields in the table that will participate in the Join operation.

    3. Select Left Join as the merge mode and add Brand Code as the merge basis.

    The merge basis field must be of the same field type.

    If fields with the same name exist in both tables, the system will automatically add the fields as the merge basis. You can manually delete or add a merge basis.

    The following table describes the merge mode.

    Merge Mode
    Description

    Left Join

    Corresponds to that in SQL statements.

    Right Join

    Corresponds to that in SQL statements.

    Inner Join

    Corresponds to that in SQL statements.

    Full Join

    Corresponds to that in SQL statements.

    Effect Display

    The fields of two tables are merged into one table. Fields used as the same merge basis will be merged into one field. For example, two Brand Code fields in these two tables are merged into one.

    Multiple Merge Bases

    Merge Maths and Physics into one table, as shown in the following figure.

    In this case, a single field is not enough as the merge basis, so Student Number and Student Name need to be used together as the merge basis. Using multiple merge bases can ensure the uniqueness of data when you perform Join, as shown in the following figure.

    Creating a Merge Calculation Field

    Through Join, you can directly process the merge basis (fields used to match the two tables) before matching.

    Example One

    The following lists examples that fields used as the merge basis of the two tables that cannot be directly merged.

    Fields used as the merge basis of the two tables are of different types.

    Fields used as the merge basis of the two tables have different values, for example, Nanjing in the left table while Nanjing City in the right table.

    You can use Create Merge Calculation to create an auxiliary merge basis field.

    For details, see Handling Match Failure in Join.

    Example Two

    If multiple merge bases are required (as described in section "Multiple Merge Bases"), for example, Student Number and Student Name together as merge bases, you can use Create Merge Calculation to merge them into one merge basis.

    Notes

    Null Values Mismatch

    If null values exist in the merging basis of the two tables when you use Join, the null values will not match each other. If you need to match the null values, you can assign values to them, such as assigning 0 to them all before merging them again.

    Generating a Cartesian Product

    If fields used as the merge basis of the two tables contain duplicate items when you perform Join, a Cartesian product will be generated, causing data expansion. For example, if the Student Number field used as the merge basis is not unique in the left or right table, a Cartesian product will be generated after you perform Join.

    The system will automatically check whether duplicate fields exist in the merge basis. If so, a prompt shown in the following figure will be displayed.


    To ensure the system stability, the system will automatically interrupt the update of self-service datasets that have undergone N:N operations when the data volume meets the following conditions after you perform Full Join:

    Data volume > 10 million.

    Expansion coefficient ≥ 5. Expansion coefficient = MIN(Data volume after you perform Full Join/Data volume of the left table data, Data volume after you perform Full Join/Data volume of the right table). For example, if the data volume is 20 million after you perform the N:N operation on 1 million of data records and 5 million of data records, the N:expansion coefficient is 4 (20 million/ 5 million).

    If a Cartesian product appears, either of the following prompts will be displayed in Update Information:

    Successful Dataset Generation: Cartesian products are found during the update. The data is severely bloated, causing an update time increase. Check whether duplicate values exist in the data column that is the merge basis in the join steps.

    Dataset Generation Error: Cartesian products are found during the update. The data is severely bloated, causing operation suspension. Check whether duplicate values exist in the data column that is the merge basis in the join steps. If so, ensure that there are not too many duplicate values.

    If you still want to perform the N:N operation for analysis when the latter prompt is displayed in Update Information, you can reduce the data volume through methods such as filtering to prevent the system from being interrupted.


    Data Field Requirements

    If spaces exist in the fields of the data table, data may be abnormal after you perform Join. You need to ensure that no space exists in the fields of the data table before you perform Join.


    Setting the Volume of Data to Be Calculate on the Preview Page

    When the data volume is relatively large, you can use First 100K Rows for calculation in the dataset editing process to make the calculation smoother.


    附件列表


    主题: Adding and Editing Data
    Previous
    Next
    • Helpful
    • Not helpful
    • Only read

    feedback

    鼠标选中内容,快速反馈问题

    鼠标选中存在疑惑的内容,即可快速反馈问题,我们将会跟进处理。

    不再提示

    10s后关闭