Optimize Report Access

  • Last update:December 20, 2021
  • I. Overview

    1. Access principle

    The designer spells out the final SQL, sends the SQL statement to the database, the database executes, and returns the data to the designer.

    Since the calculation process first needs to fetch data from the database through SQL statements, we can improve the performance of the report by controlling the amount of data and pre-processing the data in advance.


    2. Access optimization

    The core of the report is data. Whether the data set is reasonable determines the quality of the report.     

    1) Each report should have a main data set. In order to reduce the workload of maintenance, try to put all fields in the main data set, unless in some cases, not using multi-source data sets will cause the main data set to be extremely complicated .

    2) Before making a report, try to consider all the data fields that need to be displayed. In the database software, reasonably write SQL statements, and optimize SQL as much as possible in the case of big data, and add indexes.

    This document will detail how to optimize report access.

    II. Optimize SQL

    The data set of the FineReport report adopts the table model, that is to say, through the DSL language of SQL, a relational table is obtained from the database through simple queries or various combinations of related queries. This part of the SQL query is based on the long-term Optimization (such as indexing) is already very mature. The data set generally needs to be processed by the FineReport report model to generate the final sample. Therefore, the less data is retrieved from the database SQL query, the less complex processing and calculations the FineReport report model needs to do, and the time and memory spent are less, which can improve performance.

    1. SQL statements take specific fields

    We generally use the form ofselect * from to take out all the fields in a database table, and some of them are not needed in the report, for example, only three fields are needed in the report, but the actual database The table has ten fields. Some beginners habitually useselect * from table1,which is equivalent to fetching the data of ten fields to the report server side, which increases the memory usage of the report server side and slows down the calculation speed, so Try not to use "*" in the SQL statement, but write specific fields, which can reduce the memory usage of the report server and speed up the calculation of the report.


    2. Direct grouping in SQL instead of grouping in reports

    Some summary types of reports, such as making a table of order totals, may take out a large number of data records from the order list, and then perform data aggregation, that is, perform grouping and aggregation operations. During the calculation of the report, we can do it in SQL in advance Grouping and gathering at one time can greatly reduce the number of records obtained from the report server, and speed up the speed of access and report calculation.

    SQL statements:SELECT Cost_price,CategoryID FROM Product

    Select the above two fields from the database, and then summarize the Cost_price based on the CategoryID. At this time, the database returns 77 data to the report processing. as follows:

    优化报表3.png

    Optimized SQL statement:SELECT sum(Cost_price),CategoryID FROM Product group by CategoryID

    After SQL optimization, there are only 8 pieces of data to be processed in the report. as follows:

    优化报表4.png

    Optimization Analysis:

    • The first method is not only to fetch a large number of records on the report server, the fetching speed is slow, and the report model needs to group the table data columns, which increases the running time of the report;

    • In the second method, although the database needs to perform grouping operations, there are indexes in the database, the operation speed is fast, and the number of records fetched to the report server side is greatly reduced, and the fetching speed is greatly accelerated.Therefore, when the report model performs grouping operations, only A few records are performed, and the calculation speed of the report is greatly accelerated.

    Experimental results and analysis show that the performance of the second approach is far better than the first. Therefore, grouping should be done in SQL as much as possible.


    3. Direct sorting in SQL instead of sorting in reports

    In the report calculation process, data often needs to be sorted. Although the sorting operation can be performed on the report side, we still recommend to sort the data in SQL in advance. This is because the index function in the database is usually C/C++ language (often It is better than Java in terms of efficiency), which will make the sorting operation faster.


    4. Direct filtering in SQL instead of filtering in reports

    In the report calculation process, it is often not necessary to operate on all the records in the table, but only on the records that meet the conditions. Although the data can be filtered in the report designer, we recommend that the data be adjusted in SQL in advance filtering, so that the data returned by the database is reduced, which not only speeds up the fetching speed, but also speeds up the calculation speed of the report.

    III. Use views, stored procedures

    A view refers to a view in a computer database. It is a virtual table whose content is defined by a query. Like a real table, the view contains a series of named column and row data. However, the view does not exist in the database as a set of stored data values. The row and column data comes from the table referenced by the query that defines the view, and is dynamically generated when the view is referenced.

    Stored procedures can perform powerful operations and processing on data through flow control and SQL statements. For more complex business applications, it is often necessary to process the original data through stored procedures before using them in reports. In addition, before running the stored procedure, the database will analyze its grammar and syntax, and optimize it. This compiled stored procedure greatly improves the performance of SQL statements. On the report side, you only need to write shorter call statements to obtain results, thereby reducing network traffic.

    Therefore, the table-to-table connection and complex SQL should be directly performed in the database using views or stored procedures, so that the complex SQL statements are directly stored on the database server side (the database itself will perform grammatical analysis and optimization of SQL statements). The report designer does not need to write large SQL statements but directly calls views or stored procedures. On the one hand, it reduces the amount of network transmission and reduces the pressure on the database. On the other hand, it speeds up the calculation of reports.

    VI. Optimization examples

    1. Avoid using "*" in the SELECT clause  

    When you want to list all columns in the SELECT clause, using dynamic SQL column reference ‘*’ is a convenient way. Unfortunately, this is a very inefficient method. In fact, ORACLE will convert "*" into all column names in turn during the parsing process. This work is done by querying the data dictionary, which means that it will consume more time.


    2. Delete duplicate records!

    The most efficient way to delete duplicate records (because ROWID is used)

    DELETE FROM EMP E WHERE E.ROWID > (SELECT MIN(X.ROWID) FROM EMP X WHERE X.EMP_NO = E.EMP_NO)


    3.Use TRUNCATE instead of DELETE

    When deleting records in a table, under normal circumstances, rollback segments are used to store information that can be restored. If you do not have a COMMIT transaction, Oracle will restore the data to the state before the deletion (to be precise Restore to the state before executing the delete command), and when using TRUNCATE, the rollback segment no longer stores any information that can be restored. When the command is run, the data cannot be restored. Therefore, very few resources are called and the execution time is also It will be short.


    4. Count the number of records!

    Contrary to the general view, count(*) is slightly faster than count(1). Of course, if it can be retrieved by index, the count of indexed columns is still the fastest. For example, COUNT(EMPNO).


    5. Replace IN with EXISTS

    In many queries based on the underlying table, in order to meet a condition, it is often necessary to join another table. In this case, using EXISTS (or NOT EXISTS) will usually improve the efficiency of the query.

    • Inefficient    

           SELECT * FROM EMP WHERE EMPNO > 0 AND DEPTNO IN (SELECT DEPTNO FROM DEPT WHERE LOC =’MELB’)   

    • Efficient

           SELECT * FROM EMP WHERE EMPNO > 0 AND EXISTS (SELECT ‘X’  FROM DEPT WHERE DEPT.DEPTNO = EMP.DEPTNO AND LOC = ‘MELB’)   


    6. Replace DISTINCT with EXISTS

    When submitting a query that contains one-to-many table information (such as department tables and employee tables), avoid using DISTINCT in the SELECT clause. You can generally consider replacing with EXIST. For example:

    • Inefficient       

           SELECT DISTINCT DEPT_NO,DEPT_NAME  FROM DEPT D,EMP E WHERE D.DEPT_NO = E.DEPT_NO 

    • Efficient  

           SELECT DEPT_NO,DEPT_NAME  FROM DEPT D WHERE EXISTS ( SELECT ‘X’ FROM EMP E WHERE E.DEPT_NO = D.DEPT_NO) 

    • EXISTS makes queries faster


    7. Replace > with >=

    --If there is an index on DEPTNO

    • Efficient:    

       SELECT *  FROM EMP  WHERE DEPTNO >=4    

    • Inefficient:    

       SELECT *  FROM EMP   WHERE DEPTNO >3    

       The difference between the two is that the former DBMS will directly jump to the first record with DEPT equal to 4 and the latter will first locate the record with DEPTNO=3 and scan forward to the first record with DEPT greater than 3.The difference between the two is that the former DBMS will directly jump to the first record with DEPT equal to 4 and the latter will first locate the record with DEPTNO=3 and scan forward to the first record with DEPT greater than 3.


    8. Try to avoid judging fields in the where clause!

    For example:

    select id from t where num is null

    You can set the default value of 0 on num to ensure that the num column in the table does not have a null value, and then query like this:

    select id from t where num=0


    9. Avoid using the != or <> operator in where

    Will cause the engine to give up using the index and perform a full table scan. The optimizer will not be able to determine the number of rows to be hit through the index, so it needs to search all rows of the table.


    10. Avoid using or connection in the where clause!

    Otherwise, the engine will give up using the index and perform a full table scan, for example:

    select id from t where num=10 or num=20

    You can query like this:

    select id from t where num=10 union all select id from t where num=20


    11. in and not in should also be used with caution!

    Because in will make the system unable to use the index, but can only directly search the data in the table , for example:

    select id from t where num in(1,2,3)

    For continuous values, do not use in if you can use between:

    select id from t where num between 1 and 3


    12. Avoid expression operations in where!

    This will cause the engine to give up using the index and perform a full table scan. Such as:

    SELECT * FROM T1 WHERE F1/2=100

    Should be changed to:

    SELECT * FROM T1 WHERE F1=100*2

    SELECT * FROM RECORD WHERE SUBSTRING(CARD_NO,1,4)=’5378’

    Should be changed to:

    SELECT * FROM RECORD WHERE CARD_NO LIKE ‘5378%’

    SELECT member_number, first_name, last_name FROM members WHERE DATEDIFF(yy,dateofbirth,GETDATE()) > 21

    Should be changed to:

    SELECT member_number, first_name, last_name FROM members WHERE dateofbirth < DATEADD(yy,-21,GETDATE())

    That is: any operation on the column will result in a table scan, which includes database functions, calculation expressions, etc., when querying, move the operation to the right of the equal sign as much as possible.


    13. Avoid function operations in the where clause!

    This will cause the engine to give up using the index and perform a full table scan. Such as:

    select id from t where substring(name,1,3)='abc'

    --name id starting with abc

    select id from t where datediff(day,createdate,'2005-11-30')=0

    --id generated by ‘2005-11-30’

    Should be changed to:

    select id from t where name like 'abc%'

    select id from t where createdate>=’2005-11-30′ and createdate<'2005-12-1'


    14. Do not operate on the left side of the "=" in where!

    Performing functions, arithmetic operations, or other expression operations, the system may not be able to use indexes correctly.


    15. Try to avoid returning large amounts of data to the client!

    If the amount of data is too large, you should consider whether the corresponding demand is reasonable.


    16. Avoid using incompatible data types!

    For example, float and int, char and varchar, binary and varbinary are incompatible. Incompatibility of data types may prevent the optimizer from performing some optimization operations that could have been performed. E.g:

    SELECT name FROM employee WHERE salary > 60000

    In this statement, if the salary field is of money type, it is difficult for the optimizer to optimize it, because 60000 is an integer number. We should convert integer types into coin types during programming, instead of waiting for runtime conversion.


    17. Make full use of the connection conditions!

    In some cases, there may be more than one join condition between the two tables. At this time, write the join condition completely in the WHERE clause, which may greatly improve the query speed. For example:

    SELECT SUM(A.AMOUNT) FROM ACCOUNT A,CARD B WHERE A.CARD_NO = B.CARD_NO

    SELECT SUM(A.AMOUNT) FROM ACCOUNT A,CARD B WHERE A.CARD_NO = B.CARD_NO AND A.ACCOUNT_NO=B.ACCOUNT_NO

    The second sentence will execute much faster than the first sentence.


    18. Those who can use GROUP BY don't need DISTINCT

    SELECT DISTINCT OrderID FROM Details WHERE UnitPrice > 10

    Can be changed to:

    SELECT OrderID FROM Details WHERE UnitPrice > 10 GROUP BY OrderID


    19. Don't use UNION if you can use UNION ALL!

    UNION ALL does not execute the SELECT DISTINCT function, which will reduce a lot of unnecessary resources


    20. Try not to use the SELECT INTO statement!

    The SELECT INTO statement causes the table to be locked, preventing other users from accessing the table.

    Attachment List


    Theme: Performance Optimization
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    10s後關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy