Overview
After learning this document, you will grasp the most important merging logic of subject models. Such logic is also the most essential principle that distinguishes subject model merging from joins, which improves efficiency.
Besides, this document helps you better understand the merging logic of subject models by introducing the merging logic of joins for comparison.
Join Merging Logic
Joins are row-level merging that connects tables by sharing specific column values. Data merging in joins:
1. One-to-one merging in joins is simple. You only need to join the left and right tables based on the field Region.
2. N-to-one merging Due to the addition of the field Branch Store to the left table, data for Beijing and Guangzhou in the field Region has two rows of records, respectively.
To join the right table with the left one based on the field Region, data for Beijing and Guangzhou also needs to be duplicated as two separate records. In the merged table, data in the field Regional Manager is expanded, with duplicated ones marked in yellow.
Data in dimension fields, such as the field Regional Manager above, will not affect analysis when duplicated.
However, data in indicator fields, if duplicated, will affect subsequent calculations. For example, data in the field Sales Target in the following table is duplicated for matching.
In this case, if you sum up the value in the field Sales Target, the result will be incorrect.
In conclusion, indicator information duplication can cause data inflation and calculation issues, but dimension information duplication does not.
3. Data in indicator fields, if duplicated in N-to-N merging scenarios, will bring about complete chaos.
If you join the right table and the left one based on the field Region, data for Beijing in the field Region will be expanded to four rows of records, and data for Sales Volume and Staff Number will also be duplicated. In this case, the summed value will be incorrect.
In conclusion, the row-level joins (similar to the SQL JOIN statement) can cause data inflation after data is duplicated.
Subject Model Merging Logic
This section introduces the merging logic of subject models and why it cannot cause data inflation after data is duplicated.
The merging logic of subject models can be concluded into the following two points:
1. Only the fields involved in analysis and the associated fields can participate in subject model merging.
2. Aggregation is performed before merging.
The merging logic of subject models can be divided into three steps:
1. The system judges what fields have been dragged into the analysis area in components, and only the dragged fields and the associated fields can participate in subject model merging.
2. Data in data tables is aggregated based on the associated fields and the fields dragged into the analysis area in components.
3. The aggregated tables are merged.
For example, if you want to check whether the sales targets are met in each region, the fields Region, Sales Volume, and Sales Target will be used in components.
1. Only the three fields above are allowed by the system to participate in subject model merging.
2. Data in the two tables used to establish models is aggregated based on the field Region and the fields dragged into the analysis area in components.
3. The two aggregated tables are merged into the result table based on the field Region.
Another example of the N-to-N scenario:
If you want to view the average sales per person for each regional manager's team, the fields Regional Manager, Sales Volume, and Staff Number will be used in components.
1. Only the three fields above and the associated field Region are allowed by the system to participate in subject model merging.
2. Data in the two tables used to establish models is aggregated based on the field Regional Manager and the fields dragged into the analysis area in components.
3. The two aggregated tables are merged into the result table based on the field Region.
After obtaining the result, you can use the formula SUM_AGG(Sales Volume)/SUM_AGG(Staff Number) in the component to calculate the average sales per person.
Scenario of Multi-Table Models
If a model involves more than two tables, you can obtain the model results pairwise for each merging of two tables and then get the final model result table by merging the obtained model results.
For example, if you want to use the fields in Fact Table 1 and Fact Table 2 for analysis, you can merge the two tables as follows.
The system will first apply the merging logic of subject models to obtain model results pairwise for Fact Table 1 and Dimension Table and Fact Table 2 and Dimension Table. The obtained model results can be merged into the final table you want to use, facilitating analysis.
Data Integrity Ensuring in Subject Model
Left Join, Right Join, Full Join, and Inner Join in Join can be used to determine which parts of data to be retained.
Although subject model merging has no such options for you, it can ensure data integrity. For details, see Data Integrity Insurance of the Subject Model.