Subject Model Merging Principle

  • Last update:June 26, 2024
  • Overview

    After learning this document, you will grasp the most important merging logic of subject models. Such logic is also the most essential principle that distinguishes subject model merging from joins, which improves efficiency.

    Besides, this document helps you better understand the merging logic of subject models by introducing the merging logic of joins for comparison.

    Join Merging Logic

    Joins are row-level merging that connects tables by sharing specific column values. Data merging in joins:

    1. One-to-one merging in joins is simple. You only need to join the left and right tables based on the field Region.

    图1.png

    2. N-to-one merging Due to the addition of the field Branch Store to the left table, data for Beijing and Guangzhou in the field Region has two rows of records, respectively.

    To join the right table with the left one based on the field Region, data for Beijing and Guangzhou also needs to be duplicated as two separate records. In the merged table, data in the field Regional Manager is expanded, with duplicated ones marked in yellow.

    图2.png

    Data in dimension fields, such as the field Regional Manager above, will not affect analysis when duplicated.

    However, data in indicator fields, if duplicated, will affect subsequent calculations. For example, data in the field Sales Target in the following table is duplicated for matching.

    In this case, if you sum up the value in the field Sales Target, the result will be incorrect.

    In conclusion, indicator information duplication can cause data inflation and calculation issues, but dimension information duplication does not.

    图3.png

    3. Data in indicator fields, if duplicated in N-to-N merging scenarios, will bring about complete chaos.

    If you join the right table and the left one based on the field Region, data for Beijing in the field Region will be expanded to four rows of records, and data for Sales Volume and Staff Number will also be duplicated. In this case, the summed value will be incorrect.

    图4.png

    In conclusion, the row-level joins (similar to the SQL JOIN statement) can cause data inflation after data is duplicated.

    Subject Model Merging Logic

    This section introduces the merging logic of subject models and why it cannot cause data inflation after data is duplicated.

    The merging logic of subject models can be concluded into the following two points:

    1. Only the fields involved in analysis and the associated fields can participate in subject model merging.

    2. Aggregation is performed before merging.

    The merging logic of subject models can be divided into three steps:

    1. The system judges what fields have been dragged into the analysis area in components, and only the dragged fields and the associated fields can participate in subject model merging.

    2. Data in data tables is aggregated based on the associated fields and the fields dragged into the analysis area in components.

    3. The aggregated tables are merged.


    For example, if you want to check whether the sales targets are met in each region, the fields Region, Sales Volume, and Sales Target will be used in components.

    1. Only the three fields above are allowed by the system to participate in subject model merging.

    2. Data in the two tables used to establish models is aggregated based on the field Region and the fields dragged into the analysis area in components.

    3. The two aggregated tables are merged into the result table based on the field Region.

    图片5.png

    Another example of the N-to-N scenario:

    If you want to view the average sales per person for each regional manager's team, the fields Regional Manager, Sales Volume, and Staff Number will be used in components.

    1. Only the three fields above and the associated field Region are allowed by the system to participate in subject model merging.

    2. Data in the two tables used to establish models is aggregated based on the field Regional Manager and the fields dragged into the analysis area in components.

    3. The two aggregated tables are merged into the result table based on the field Region.

    After obtaining the result, you can use the formula SUM_AGG(Sales Volume)/SUM_AGG(Staff Number) in the component to calculate the average sales per person.

    图6.png

    Scenario of Multi-Table Models

    If a model involves more than two tables, you can obtain the model results pairwise for each merging of two tables and then get the final model result table by merging the obtained model results.

    For example, if you want to use the fields in Fact Table 1 and Fact Table 2 for analysis, you can merge the two tables as follows.

    The system will first apply the merging logic of subject models to obtain model results pairwise for Fact Table 1 and Dimension Table and Fact Table 2 and Dimension Table. The obtained model results can be merged into the final table you want to use, facilitating analysis.

    图7.png

    Data Integrity Ensuring in Subject Model

    Left Join, Right Join, Full Join, and Inner Join in Join can be used to determine which parts of data to be retained.

    Although subject model merging has no such options for you, it can ensure data integrity. For details, see Data Integrity Insurance of the Subject Model.

    附件列表


    主题: Adding and Editing Data
    Previous
    Next
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    9s后關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy