Overview
This document lists several common slow update phenomena and describes how to optimize them.
Long Single Table Update Time
Description
Many users only update one table, but the time for the update is very long.
Cause
Updating a small single table will trigger updates to all associated tables, analysis tables, and associations. Therefore, a simple update task may trigger updates on hundreds of tables behind.
As a result, when you have set up many single table updates and the task time is relatively scattered, it is easy to have duplicate updates.
For example, Table A has 100 related tables, and Table B also has 100 related tables, of which 99 are the same as the related tables of Table A. You perform a single table update on Table A at 6:00 and update Table B at 6:10. In this case, 99 tables were updated at 6:00 and then updated again at 6:10.
Solution
As long as the tables to be updated are updated together, there will be no duplicate updates. (The system will automatically deduplicate.)
1. Try to perform global updates by updating all data once every night.
2. If you must perform single table updates, update these tables together.
Before the version that was updated on 2019-08-15: You need to place the tables in a business package and set an update task for the business package to perform a single table update.
After the version that was updated on 2019-08-15: Only the update time of the tables needs to be consistent, and the tables do not need to be in the same business package. The system will trigger the update of them together.
Slow SQL Execution
Description
After you use functions that need to be configured manually such as SQL Dataset, Incremental Update, and Incremental Deletion, the update becomes slow.
Cause
The SQL execution is slow, and it may take some time for you to execute SQL statements for preview in the database.
During the FineBI update, these SQL statements will be executed, which takes a long time.
Solution
Optimize SQL statements.
Slow Update Caused by Join
Description
After you perform Join on the tables, the update becomes slower.
Cause
The function to perform the left and right merge is like the Join statement in the database. The function is powerful, unlimited, and can process any data amount.
However, after the process, the amount of data will increase. Especially N:N, there will be a Cartesian product, and tables at the tens of millions level may expand to the hundreds of millions level.
Since the data volume increases, the update becomes slower.
Solution
There are four types of Join. The performance of them differs:
Inner Join (1:1) > Left Join (N:1) = Right Join (1:N) >> Full Join (N:N)
Tables with small data volumes can be selected freely, but when you perform Join on tables with large data volumes:
If there is no special need, choose Inner Join as much as possible and avoid choosing Full Join.
If you must perform Full Join, split a large wide table into tables with fewer columns to reduce the data expansion after Full Join.
Optimization on the Self-Service Dataset Operation
Description
The update time of self-service datasets with different step orders but achieving the same results differs.
Cause
Some procedures have relatively complex calculation logic in the self-service dataset, such as Group Summary, Join, and complex New Column.
If you perform the filter first, you can perform the complex calculations with a smaller data volume and reduce the expanded data volume after the merge, ultimately reducing update time.
Solution
You are advised to complete simple operations such as Filter, Sort, and Field Setting before performing complex operations such as Group Summary, Merge, and complex New Column.