Difference Between Direct-Connected Data and Extracted Data

  • Last update:November 01, 2024
  • Overview

    Version

    FineBI Version
    Function Adjustment

    6.0

    /

    6.0.2

    Unified some calculation logics of direct-connected data and extracted data. For details, see section "Calculation Logic of Extracted Data and Direct-Connected Data in Components."

    Application Scenario

    This document introduces what extracted data and direct-connected data are, what is the differences between these two kinds of data, and when to use extracted data or direct-connected data.

    Introduction to Extracted Data and Direct-Connected Data

    Engine Overview


    Engine Overview

    Extracted Data

    If you use extracted data, data in the database will be extracted to FineBI (similar to saving data to FineBI). Therefore, data in the database and data in FineBI are not continuously synchronous. You need to regularly update data in FineBI so that data can be consistent with that in the database. Since data is extracted and saved to the FineBI engine, enough space in the local disk is required in the Extracted Data mode.

    Through data cache/data copy creation, online analytical processing (OLAP) of large volumes of data is supported to accelerate query performance, ensure analysis experience, and minimize the impacts on business databases as much as possible.

    Direct-Connected   Data

    If you use a direct -connected dataset, data in your database is directly used in FineBI for calculation. Therefore, data in FineBl and data in the database are kept synchronous.

    With the help of your big data platform/data warehouse, the simple self-service analysis requirement can be met in the case of high concurrency and large data volumes.

    Usage Requirement


    Usage Requirement

    Extracted Data

    For details about related server performance requirements for extraction, see Recommended Environment and Configuration for Project Deployment.

    Direct-Connected   Data

    • The self-owned big data platform boasts high database reliability and excellent performance.

    • Self-service analysis is performed according to the usage modes recommended for different database types respectively.

    •  Web clusters can be used to connect to databases to support high concurrency.

    • Certain database O&M capabilities are guaranteed.

    Usage Scenario


    Application Scenario

    Extracted Data

    (1) Complex self-service lineage analysis of 10 million data records

    •  10 million data records (10 million or less)

    •  Complex dataset

    • Ultra-fast analysis (what you see is what you get)

    (2) Joint analysis of data from multiple databases

    Direct-Connected Data

    (1) Simple self-service analysis of large volumes of data

    • Simple query of over 100 million data records (generally self-service datasets not recommended)

    • SQL query in lieu of visual operation

    (2) Relatively-high user quantity and concurrency, requiring linear scalability

    (3) Relatively-high requirement on timeliness

    (4) Relatively-high requirement on data security

    (5) Scenario where the data volume is not large and extraction is troublesome

    Comparison Between Direct-Connected Data and Extracted Data


    Comparison

    Extracted Data

    Due to the data volume limitation, extraction of over 100 million data records cannot be supported.
    The data needs to be fully extracted from the data warehouse and business database to the FineBI server. Some customers may not want to fully extract the data again.

    Direct-Connected   Data

    • For more than 100 million data records, the performance relies entirely on customers' databases and data warehouses. BI cannot perform deep optimization. (Extraction of data from external databases, which are often not BI-dedicated databases, for query will occupy database resources and cause poor performance.)

    • The timeliness of the data depends on that of the data in customers' warehouses.

    • The multiple levels of self-service datasets result in complex SQL query and longer calculation time.

    • In the Extracted Data mode, joint analysis (Add AssociationJoinUnion AllColumn from Other Tables, and Model View > Edit Association) cannot be performed on data tables across different data sources.

    Customer Portrait


    Customer Portrait

    Extracted Data

    Generally, it is applicable to small enterprises that have small data volumes/low budgets and require self-service analysis.

    Direct-Connected Data

    Generally, it is applicable to large enterprises, who have completely-constructed big data platforms and attach great importance to data security (indicating that they do not want to extract data again) and data timeliness.

    When to Use Extracted Data or Direct-Connected Data

    iconNote:
    The data volume described in this section refers to the data volume of the result set, namely that of the direct table (not the base table) used in the dashboard.


    Extracted Data Recommended If There Are Not Many Tables with 100 Million Data Records

    If the result set contains a small volume of data (10 million or less), use extracted data.

    If the result set contains a large volume of data (100 million or less), you are preferentially advised to use extracted data.

    Direct-Connected Data Recommended If There Are Many Tables with 100 Million Data Records

    If the result set contains 10 million or more data records and high timeliness (hour-level update) is required, you are advised to use direct-connected data.

    Notes for Direct-Connected Data

    1. If the direct-connected database is a high-performance OLAP database (such as the StarRocks database, Doris database, Hologres database, Vertica database, and GaussDB 200 database), simple self-service analysis is supported.

    Simple self-service analysis scenarios refer to scenarios meeting the following points:

    The total number of complex calculation steps in the self-service dataset (namely the total number of Join, Column from Other Tables, Summary Column, Formula Column > DEF Function, Row to Column, and Column to Row steps) is less than or equal to 2.

    The maximum number of lineage levels of the direct-connected dataset is limited to 3.

    If a subject model is used, no complex calculation steps can be used in the self-service dataset.

    2. If the direct-connected database is of other types, the self-service dataset can have a maximum of 1 complex calculation step added and a maximum of 3 lineage levels.

    Calculation Logic of Extracted Data and Direct-Connected Data in Components

    Calculation Logic in the Same Scenario

    Calculation Logic
    Extracted DataDirect-Connected Data

    Influence of the filtering and quick calculation on the summary value

    No influence

    No influence

    Influence of one indicator (for which the filtering and quick calculation are performed) on other   indicators (for which the quick calculation is performed)

    No influence

    No influence

    Influence of one indicator (for which the filtering and quick calculation are performed) on other   indicators summarized by the quick calculation

    No influence

    No influence

    Indicator-based dimension filtering/sorting

    Filtering/Sorting based on the results of summary rows

    Filtering/Sorting based on the results of summary rows

    Filtering logic of the cross table

    Normal filtering according to the filtering conditions

    Normal filtering according to the filtering conditions

    Filtering in Filter and hear filtering at the same filtering level

    Intersection of the filtering conditions in both

    Intersection of the filtering conditions in both

    Different filtering logics for null and empty strings

    Filtering out all null and empty strings (whether you select null or empty strings for filtering)

    Filtering according to the database logic

    (If the database logic is to filter out null and empty strings separately, the result of direct-connected data will be different from that of extracted data.)

     


    附件列表


    主题: Data Center
    Previous
    Next
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    9s后關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy