6.0
/
6.0.2
Unified some calculation logics of direct-connected data and extracted data. For details, see section "Calculation Logic of Extracted Data and Direct-Connected Data in Components."
This document introduces what extracted data and direct-connected data are, what is the differences between these two kinds of data, and when to use extracted data or direct-connected data.
Engine Overview
Extracted Data
If you use extracted data, data in the database will be extracted to FineBI (similar to saving data to FineBI). Therefore, data in the database and data in FineBI are not continuously synchronous. You need to regularly update data in FineBI so that data can be consistent with that in the database. Since data is extracted and saved to the FineBI engine, enough space in the local disk is required in the Extracted Data mode.
Through data cache/data copy creation, online analytical processing (OLAP) of large volumes of data is supported to accelerate query performance, ensure analysis experience, and minimize the impacts on business databases as much as possible.
Direct-Connected Data
If you use a direct -connected dataset, data in your database is directly used in FineBI for calculation. Therefore, data in FineBl and data in the database are kept synchronous.
With the help of your big data platform/data warehouse, the simple self-service analysis requirement can be met in the case of high concurrency and large data volumes.
Usage Requirement
For details about related server performance requirements for extraction, see Recommended Environment and Configuration for Project Deployment.
The self-owned big data platform boasts high database reliability and excellent performance.
Self-service analysis is performed according to the usage modes recommended for different database types respectively.
Web clusters can be used to connect to databases to support high concurrency.
Certain database O&M capabilities are guaranteed.
Usage Scenario
(1) Complex self-service lineage analysis of 10 million data records
10 million data records (10 million or less)
Complex dataset
Ultra-fast analysis (what you see is what you get)
(2) Joint analysis of data from multiple databases
(1) Simple self-service analysis of large volumes of data
Simple query of over 100 million data records (generally self-service datasets not recommended)
SQL query in lieu of visual operation
(2) Relatively-high user quantity and concurrency, requiring linear scalability
(3) Relatively-high requirement on timeliness
(4) Relatively-high requirement on data security
(5) Scenario where the data volume is not large and extraction is troublesome
Comparison Between Direct-Connected Data and Extracted Data
Due to the data volume limitation, extraction of over 100 million data records cannot be supported.The data needs to be fully extracted from the data warehouse and business database to the FineBI server. Some customers may not want to fully extract the data again.
For more than 100 million data records, the performance relies entirely on customers' databases and data warehouses. BI cannot perform deep optimization. (Extraction of data from external databases, which are often not BI-dedicated databases, for query will occupy database resources and cause poor performance.)
The timeliness of the data depends on that of the data in customers' warehouses.
The multiple levels of self-service datasets result in complex SQL query and longer calculation time.
In the Extracted Data mode, joint analysis (Add Association, Join, Union All, Column from Other Tables, and Model View > Edit Association) cannot be performed on data tables across different data sources.
Customer Portrait
Generally, it is applicable to small enterprises that have small data volumes/low budgets and require self-service analysis.
Generally, it is applicable to large enterprises, who have completely-constructed big data platforms and attach great importance to data security (indicating that they do not want to extract data again) and data timeliness.
If the result set contains a small volume of data (10 million or less), use extracted data.
If the result set contains a large volume of data (100 million or less), you are preferentially advised to use extracted data.
If the result set contains 10 million or more data records and high timeliness (hour-level update) is required, you are advised to use direct-connected data.
1. If the direct-connected database is a high-performance OLAP database (such as the StarRocks database, Doris database, Hologres database, Vertica database, and GaussDB 200 database), simple self-service analysis is supported.
Simple self-service analysis scenarios refer to scenarios meeting the following points:
① The total number of complex calculation steps in the self-service dataset (namely the total number of Join, Column from Other Tables, Summary Column, Formula Column > DEF Function, Row to Column, and Column to Row steps) is less than or equal to 2.
② The maximum number of lineage levels of the direct-connected dataset is limited to 3.
③ If a subject model is used, no complex calculation steps can be used in the self-service dataset.
2. If the direct-connected database is of other types, the self-service dataset can have a maximum of 1 complex calculation step added and a maximum of 3 lineage levels.
Influence of the filtering and quick calculation on the summary value
No influence
Influence of one indicator (for which the filtering and quick calculation are performed) on other indicators (for which the quick calculation is performed)
Influence of one indicator (for which the filtering and quick calculation are performed) on other indicators summarized by the quick calculation
Indicator-based dimension filtering/sorting
Filtering/Sorting based on the results of summary rows
Filtering logic of the cross table
Normal filtering according to the filtering conditions
Filtering in Filter and hear filtering at the same filtering level
Intersection of the filtering conditions in both
Different filtering logics for null and empty strings
Filtering out all null and empty strings (whether you select null or empty strings for filtering)
Filtering according to the database logic
(If the database logic is to filter out null and empty strings separately, the result of direct-connected data will be different from that of extracted data.)
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy