System Data Management- FineBI Help Document

Last update：December 15, 2023

Data Connection Management

1. Permission control: The permission to create data connections is only open to super administrators and sub-administrators.

2. Repeatability verification: First check whether the data connection already exists before creation. It is not allowed to have two data connections that are completely identical in database address, database name, user, etc.

3. Naming standard: Establish unified specifications for naming data connections. The recommended format is "business system/data theme_database type", such as "BI platform operation_MySQL" and "xx ERP system transaction data_GP". Alternatively, you may use "business system/data theme_department_operator/responsible person", such as "BI platform usage record_operation department_peter". Super administrators need to monitor the naming of data connections.

4. Fixed data connection name: Avoid changing the data connection name as it may cause the system to fail to find the database. Determine the data connection name in advance. If the data connection name is changed, the data connection shall be re-selected by modifying the SQL statements. If the data connection name is likely to be changed, add the basic table instead of the database table using SQL statements.

Data Strategy (Direct Connection or Extraction)

FineBI supports two data retrieval modes, namely the direct connection mode, where FineBI directly uses the data in the database for calculation, and the extraction mode, in which mode FineBI extracts data and stores them in its engine for calculation.

A system can use both modes to fetch data, but each dashboard can use data obtained in only one mode.

1. The direct connection mode is preferred in case the enterprise has a relatively professional big data platform with high data quality, or it has high requirements for data timeliness and does not accept data settlement with day delays, or it has high requirements for data security and does not want to store the data in the BI project.

2. If the maximum amount of data records for the enterprise exceeds 100 million and frequent analysis of tables with large data volumes is required, the direct connection mode is advised to shift the calculation to the database and reduce the pressure on BI services.

3. However, the direct connection mode does not support the operations involving data tables of different sources such as establishing associations, joining, union all, etc. In this case, if cross-table association is required in actual business, FineDataLink can be used for synchronizing data across databases.

4. The calculation in the data extraction mode uses the computing power and resources of the BI server (mainly the memory), and the calculation performance is greatly affected by the server resource occupancy and FineBI’s ability to use resources. The calculation performance in direct connection mode is strongly related to the database engine. FineBI uses the engine to convert the front-end calculation logic into SQL statements and send them to the database for calculation and then obtains the results returned by the database for display. If the enterprise selects direct connection, confirm the business database can support the highly concurrent SQL calculations issued by BI, and confirm that the query speed and resource occupancy under high concurrency meet requirements.

Update of Extracted Data

Data updates can be completed quickly with small resource consumption if executed correctly, not affecting the normal use of data and system performance.

Update Type

1. For dimension tables and data tables with a small data volume, Full-Volume Update is a good choice.

2. For tables with a Timestamp field that can be used to compare with Update Time to achieve incremental updates where historical data are not going to be changed, first consider Incremental Update (except for server dataset and Excel dataset).

a. If Incremental Update is selected, the SQL for incremental update cannot be empty.

b. Do not use select * when performing incremental deletion. Instead, select only the unique identifier, such as uuid.

Update Strategy of Scheduled Tasks

The general update strategy is a combination of global scheduled updates, scheduled updates of special business packages, and scheduled updates of special single tables. That is, except for a few special demands that require real-time data updates or independent updates for specific business needs, the dataset update in the extraction mode adopts Full-Volume Update where the business package is updated along with the global update, the basic dataset is updated along with the business package, and the self-service dataset is updated along with its parent table.

1. Give priority to determining the time for full-volume update (which can be flexibly adjusted according to the last data extraction time of the platform with a given day) when making the update strategy. Configure the scheduled update of business packages and tables if special circumstances occur. The self-service dataset does not require configurations as it will automatically update with the parent tables.

2. Check the linked tables in advance if the full-volume update is not configured, or many tables/business packages need to be separately updated besides the full-volume update. For tasks with a high degree of overlap in table lineage, updating them at the same time is recommended as it can automatically remove duplicates and avoid repeated updates. Conversely, if tasks involving the same table are updated at different time segments, that table may be repeatedly updated, resulting in resource waste. If multiple tasks have little overlap in table lineage, stagger the update time to prevent system blockage.

3. Update the data in time. If the full-volume update conflicts with the update of warehouse data or platform data, the data displayed on the platform may not be the latest ones. Pay attention to the order of magnitude and the number of associated datasets before updating. If the dataset is too large or there are more than 50 associations, update data when the system is idle.

Big Data Management

It is sometimes inevitable to introduce business tables with huge data volumes (greater than 50 million rows) during analysis. In direct connection mode, the performance of analyzing tables with large data volumes is determined by the database. If the database engine has a poor performance, the query may fail to go swimmingly, and frequent data queries involving massive data may cause database thread blocking or even crashes. In extraction mode, updating large amounts of data may occupy a lot of memory and CPU resources, and it also occupies a large amount of disk space after data are extracted, leading to poor analysis performance.

As data accumulates with time, the data volume of some tables keeps increasing. Since the BI platform cannot restrict the use by business personnel, restricting and controlling the introduction of business tables with large data volumes is necessary to ensure continuous efficient data analysis. The continuous growth of data volume is carved in stone, but we must efficiently meet the business's requirements for data analysis. With limited operations, we should ensure the smooth operation of the entire system while also meeting the needs of using data for analysis.

Here are some practical ideas:

1. Granularity limit: Is the granularity of the data to be analyzed by business personnel consistent with the minimum granularity in the database? The larger the granularity, the smaller the amount of data after aggregation. The granularity of imported data can be adjusted according to the feedback from business personnel.

2. Time limit: Most of the tables with large data volumes have a time dimension. Check whether data of all time segments needs to be analyzed. If most of the analysis focuses on a specific period, import the detailed data of that period and aggregate data of other periods.

3. Update cycle restriction: Adopt Incremental Update for extracted data to relieve the update pressure on the system. If it is not feasible, set separate update cycles for data with low timeliness requirements according to the actual business needs, such as updating the data once a month when the system is idle.

The above ideas are made based on the actual usage of business users. Thus we suggest that in daily data management, analysis users of all platforms pay attention to the processing of big data with the large data volume as its obvious feature.

Solutions Related to System Performance

1. Control the global memory: If the memory occupancy reaches the alert threshold due to the use of business personnel, thread exporting will be interrupted by the memory protection mechanism of FineBI.

2. Set restrictions for Excel export: You can set restrictions for Excel export in System Management > System Setting > General.

a. Data Volume Limit on Excel Export: This parameter is not set by default. The recommended setting value ranges from 0 to 1,000,000,000.

b. Concurrent Thread Quantity Limit Upon Detail Table Export: It defaults to 3, and the recommended setting value ranges from 1 to 5 (the default value is advised).

3. If Nginx is used, you can configure HTTP2: It can increase the number of concurrent browser requests. (Using the HTTP2 protocol will increase the number of concurrent database requests. Evaluate the database performance before use.)

4. Use the fine_conf_entity Visual Configuration Plugin to configure the SystemOptimizationConfig.queryConditionCountRestriction parameter: It can interrupt queries in which the filter conditions exceed the limit and reports an error saying "condition count out of restriction: xx".

Suggested value: 30 (it can be smaller, depending on the actual condition).

5. Restrict BI linkage numbers: If multiple components in a dashboard are linked to each other, three or more layers of linkages are removed and the two layers before the last component to be linked are retained, thus avoiding slow loading of the dashboard and ensuring its performance.

6. Limit component query duration: Go to Management System > System Management > BI Parameter, and set the two query timeout parameters to limit the duration of component queries.

a. Set the Direct Connect/Extracted Query Timeout parameters if a dashboard contains too many components and the query of components takes too long, or the query of a component in the dashboard takes too long and leads to the subsequent requests being blocked and the BI being mistaken as experiencing downtime. All timeout queries of real-time/extracted data will be aborted to avoid blocking other normal queries.

b. Suggested setting range: 10 - 300, and this value can be set based on actual business and acceptable waiting time.

Note:

These parameters are valid for the Polars engine. For the Spider engine, you can set the DistributedOptimizationConfig.spiderConfig.spider_query_timeout_open and DistributedOptimizationConfig.spiderConfig.spider_query_timeout_limit parameters, which can interrupt requests in cases of high concurrency to ensure system operation.

Previous：FineReport Introducing Qunee for HTML5

Next：Data Permission Management

Helpful
Not helpful
Only read

中文（简体）中文（繁體）日本語

English

System Data Management