Difference Between Direct-Connected Data and Extracted Data- FineBI Help Document

Last update：November 01, 2024

Overview

Version

FineBI Version	Function Adjustment
6.0	/
6.0.2	Unified some calculation logics of direct-connected data and extracted data. For details, see section "Calculation Logic of Extracted Data and Direct-Connected Data in Components."

Application Scenario

This document introduces what extracted data and direct-connected data are, what is the differences between these two kinds of data, and when to use extracted data or direct-connected data.

Introduction to Extracted Data and Direct-Connected Data

Engine Overview

	Engine Overview
Extracted Data	If you use extracted data, data in the database will be extracted to FineBI (similar to saving data to FineBI). Therefore, data in the database and data in FineBI are not continuously synchronous. You need to regularly update data in FineBI so that data can be consistent with that in the database. Since data is extracted and saved to the FineBI engine, enough space in the local disk is required in the Extracted Data mode. Through data cache/data copy creation, online analytical processing (OLAP) of large volumes of data is supported to accelerate query performance, ensure analysis experience, and minimize the impacts on business databases as much as possible.
Direct-Connected Data	If you use a direct -connected dataset, data in your database is directly used in FineBI for calculation. Therefore, data in FineBl and data in the database are kept synchronous. With the help of your big data platform/data warehouse, the simple self-service analysis requirement can be met in the case of high concurrency and large data volumes.

Extracted Data

If you use extracted data, data in the database will be extracted to FineBI (similar to saving data to FineBI). Therefore, data in the database and data in FineBI are not continuously synchronous. You need to regularly update data in FineBI so that data can be consistent with that in the database. Since data is extracted and saved to the FineBI engine, enough space in the local disk is required in the Extracted Data mode.

Through data cache/data copy creation, online analytical processing (OLAP) of large volumes of data is supported to accelerate query performance, ensure analysis experience, and minimize the impacts on business databases as much as possible.

Direct-Connected Data

If you use a direct -connected dataset, data in your database is directly used in FineBI for calculation. Therefore, data in FineBl and data in the database are kept synchronous.

With the help of your big data platform/data warehouse, the simple self-service analysis requirement can be met in the case of high concurrency and large data volumes.

Usage Requirement

	Usage Requirement
Extracted Data	For details about related server performance requirements for extraction, see Recommended Environment and Configuration for Project Deployment.
Direct-Connected Data	The self-owned big data platform boasts high database reliability and excellent performance. Self-service analysis is performed according to the usage modes recommended for different database types respectively. Web clusters can be used to connect to databases to support high concurrency. Certain database O&M capabilities are guaranteed.

Usage Scenario

	Application Scenario
Extracted Data	(1) Complex self-service lineage analysis of 10 million data records 10 million data records (10 million or less) Complex dataset Ultra-fast analysis (what you see is what you get) (2) Joint analysis of data from multiple databases
Direct-Connected Data	(1) Simple self-service analysis of large volumes of data Simple query of over 100 million data records (generally self-service datasets not recommended) SQL query in lieu of visual operation (2) Relatively-high user quantity and concurrency, requiring linear scalability (3) Relatively-high requirement on timeliness (4) Relatively-high requirement on data security (5) Scenario where the data volume is not large and extraction is troublesome

Application Scenario

Extracted Data

(1) Complex self-service lineage analysis of 10 million data records

10 million data records (10 million or less)
Complex dataset
Ultra-fast analysis (what you see is what you get)

(2) Joint analysis of data from multiple databases

Direct-Connected Data

(1) Simple self-service analysis of large volumes of data

Simple query of over 100 million data records (generally self-service datasets not recommended)
SQL query in lieu of visual operation

(2) Relatively-high user quantity and concurrency, requiring linear scalability

(3) Relatively-high requirement on timeliness

(4) Relatively-high requirement on data security

(5) Scenario where the data volume is not large and extraction is troublesome

Comparison Between Direct-Connected Data and Extracted Data

	Comparison
Extracted Data	Due to the data volume limitation, extraction of over 100 million data records cannot be supported. The data needs to be fully extracted from the data warehouse and business database to the FineBI server. Some customers may not want to fully extract the data again.
Direct-Connected Data	For more than 100 million data records, the performance relies entirely on customers' databases and data warehouses. BI cannot perform deep optimization. (Extraction of data from external databases, which are often not BI-dedicated databases, for query will occupy database resources and cause poor performance.) The timeliness of the data depends on that of the data in customers' warehouses. The multiple levels of self-service datasets result in complex SQL query and longer calculation time. In the Extracted Data mode, joint analysis (Add Association, Join, Union All, Column from Other Tables, and Model View > Edit Association) cannot be performed on data tables across different data sources.

Comparison

Extracted Data

Due to the data volume limitation, extraction of over 100 million data records cannot be supported.
The data needs to be fully extracted from the data warehouse and business database to the FineBI server. Some customers may not want to fully extract the data again.

Direct-Connected Data

For more than 100 million data records, the performance relies entirely on customers' databases and data warehouses. BI cannot perform deep optimization. (Extraction of data from external databases, which are often not BI-dedicated databases, for query will occupy database resources and cause poor performance.)
The timeliness of the data depends on that of the data in customers' warehouses.
The multiple levels of self-service datasets result in complex SQL query and longer calculation time.
In the Extracted Data mode, joint analysis (Add Association, Join, Union All, Column from Other Tables, and Model View > Edit Association) cannot be performed on data tables across different data sources.

Customer Portrait

	Customer Portrait
Extracted Data	Generally, it is applicable to small enterprises that have small data volumes/low budgets and require self-service analysis.
Direct-Connected Data	Generally, it is applicable to large enterprises, who have completely-constructed big data platforms and attach great importance to data security (indicating that they do not want to extract data again) and data timeliness.

When to Use Extracted Data or Direct-Connected Data

Note:

The data volume described in this section refers to the data volume of the result set, namely that of the direct table (not the base table) used in the dashboard.

Extracted Data Recommended If There Are Not Many Tables with 100 Million Data Records

If the result set contains a small volume of data (10 million or less), use extracted data.

If the result set contains a large volume of data (100 million or less), you are preferentially advised to use extracted data.

Direct-Connected Data Recommended If There Are Many Tables with 100 Million Data Records

If the result set contains 10 million or more data records and high timeliness (hour-level update) is required, you are advised to use direct-connected data.

Notes for Direct-Connected Data

1. If the direct-connected database is a high-performance OLAP database (such as the StarRocks database, Doris database, Hologres database, Vertica database, and GaussDB 200 database), simple self-service analysis is supported.

Simple self-service analysis scenarios refer to scenarios meeting the following points:

① The total number of complex calculation steps in the self-service dataset (namely the total number of Join, Column from Other Tables, Summary Column, Formula Column > DEF Function, Row to Column, and Column to Row steps) is less than or equal to 2.

② The maximum number of lineage levels of the direct-connected dataset is limited to 3.

③ If a subject model is used, no complex calculation steps can be used in the self-service dataset.

2. If the direct-connected database is of other types, the self-service dataset can have a maximum of 1 complex calculation step added and a maximum of 3 lineage levels.

Calculation Logic of Extracted Data and Direct-Connected Data in Components

Calculation Logic in the Same Scenario

Calculation Logic	Extracted Data	Direct-Connected Data
Influence of the filtering and quick calculation on the summary value	No influence	No influence
Influence of one indicator (for which the filtering and quick calculation are performed) on other indicators (for which the quick calculation is performed)	No influence	No influence
Influence of one indicator (for which the filtering and quick calculation are performed) on other indicators summarized by the quick calculation	No influence	No influence
Indicator-based dimension filtering/sorting	Filtering/Sorting based on the results of summary rows	Filtering/Sorting based on the results of summary rows
Filtering logic of the cross table	Normal filtering according to the filtering conditions	Normal filtering according to the filtering conditions
Filtering in Filter and hear filtering at the same filtering level	Intersection of the filtering conditions in both	Intersection of the filtering conditions in both
Different filtering logics for null and empty strings	Filtering out all null and empty strings (whether you select null or empty strings for filtering)	Filtering according to the database logic (If the database logic is to filter out null and empty strings separately, the result of direct-connected data will be different from that of extracted data.)

Helpful
Not helpful
Only read

中文（简体）中文（繁體）日本語

English

Difference Between Direct-Connected Data and Extracted Data

Overview

Version

Application Scenario

Introduction to Extracted Data and Direct-Connected Data

When to Use Extracted Data or Direct-Connected Data

Extracted Data Recommended If There Are Not Many Tables with 100 Million Data Records

Direct-Connected Data Recommended If There Are Many Tables with 100 Million Data Records

Notes for Direct-Connected Data

Calculation Logic of Extracted Data and Direct-Connected Data in Components

Calculation Logic in the Same Scenario

附件列表