Successfully!

Error!

You are viewing 10.0 help doc. More details are displayed in the latest help doc

Downtime Handling

I. Overview

1. Version

Designer
JAR packageFunction changes
10.0.15
--
10.0.17-

Section II.1, running check, added check item "Operating System", optimized check item "JDK"

Section II.2, the port configuration of the automatic troubleshooting tool is detected, and an error message is displayed

Section II.3, limits reading to one month of downtime processing records

10.0.192021-10-30Section II.1, Added downtime self-help wizard function


2. Application scenarios

After the report project is deployed on the server, the system breaks down due to various reasons. If you restart the system after the breakdown without capturing DUMP files, it is difficult to locate the real cause. As a result, you cannot quickly resolve the problem or take preventive measures to prevent recurrence. As a result, time and effort are lost, and server operation and maintenance is not friendly.

Added the downtime handling function to report project after 10.0.15. With this function, the report project can automatically generate DUMP files and restart the system.

After a breakdown occurs, the DUMP file generated can be analyzed to quickly locate the cause of the breakdown, handle the breakdown problem in time, and take effective preventive measures, which adds insurance for users.

3. Function introduction

The "Downtime Handling" function is essentially an automatic downtime handling tool, with platform entry and client entry.

1) Platform: The part built in the platform is mainly for tool running environment monitoring and some settings, which are respectively for Downtime self-help wizard,Running check, Downtime handling, Memory stack export record and Server restart record.

2) Client: The client mainly monitors the project running status.

Note 1: Downtime handling is not supported for non-Tomcat container deployments and projects built into the designer.

Note 2: When users who integrate reports and BI use this function, they should first call the "Downtime Handling" function of the BI project and disable the "Downtime Handling" function of the report project to ensure that the operation of the BI project is not affected.

II. Platform

Super administrator logs in to the decision-making platform, click "Manage > Intelligent Operations > Downtime Handling", and you can see the corresponding function modules on the platform. As shown in the figure below:

The functions of the module are mainly divided into five parts, namely Downtime self-help wizardRunning check, Downtime Handling , Memory stack export recordand Server restart record.

Note: When the page for "Downtime Handling" is opened on the platform, it takes a long time to load. Please wait patiently.

1.png

1. Downtime self-help waizard

Record the downtime and downtime reasons of the report project, and provide corresponding recommended solutions. As shown below:

2.png

Downtime ReasonRecommended Solution

Memory overflow

  • Engine suspension

  • Calculate the template that takes up large memory

  • Long calculation duration template

  • Heap memory allocation is too small

  • The number of rows in a single data set exceeds the recommended limit

  • The number of cells in a single template exceeds the recommended limit

1) Check templates, enable and configure template restrictions reasonably

2) Use system inspection to detect and reasonably configure heap memory

For the system check function, please refer to: System Check

3) Check the template, enable and configure the template restrictions reasonably

JDK with bugsReplaced with JDK8 of 1.8.0_181 and above
Operating system memory configuration is unreasonable

Use system check to detect and reasonably configure system memory

For the system check function, please refer to: System Check

Insufficient disk space

Please check disk space and clean up unnecessary files

The number of memory mapped files is set too low

Use system check to detect the configuration of the number of memory mapped files and use recommended values to modify

For the system check function, please refer to: System Check

The third-party engine bug used by the old version of the chart

Update the system to the latest version

For engineering upgrades, please refer to: Upgrade Guide

There is a crash file but the cause of the crash cannot be determined

Upload cloud operation and maintenance logs for feedback

For cloud operation and maintenance functions, please refer to: Cloud operation and maintenance steps

ssh exit

Replace the startup command or use the secureCRT command line tool

For the project startup method, please refer to: Web application server startup automatically

Thread blocking caused by fetching

Use the lottery cache function to increase the speed of the number or use the template assistant function to detect

For the lottery cache function, please refer to: Introduction to lottery cache

Log output blocked

Increase the log level or check whether the remaining disk space is sufficient

For log level adjustment, please refer to: Log Introduction

System memory release time is too long

1) Adjust the heap memory to a value less than 64GB

For heap memory adjustment methods, please refer to: System Check

2) It is recommended to replace a higher performance CPU

For hardware configuration recommendations, please refer to: Hardware Configuration


2. Running check

If use the downtime handling function in Section II.3, must ensure that each item in the 'Running Check 'meets the requirements.

The system must meet certain conditions to ensure the proper running of the automatic handling tool. Therefore, after the project is started, port , JDK, off-heap memory, and deployment method will be detected first.

If any problem occurs, the system reminds the user to modify the fault or disable the shutdown function. If no problem occurs, the system continues to run stably. As shown in the figure below:

2.png

The check contents of each item of "Running check" are as follows:

Note 1: If the detection result is optimal, the system displays "This configuration is good, no need to adjust."

Note 2:10.0.17 and later added the check item "Operating System".

Note 3: Check item "JDK". JDK must be configured for the version before 10.0.17.

No.Check itemsDetection of the problem criteriaModification Suggestions
1Operating SystemThe current operating system is not Windows or Linux

You are advised to use the Linux OS to ensure the stable running of the automatic handling tool.

In this case, the following four checks will not be performed.

2Port12100 port status is abnormal (The port is not enabled or occupied)

You are advised to open port 12100 or set other ports to ensure normal system operation.

Refer to section II.2.5) of this article for port setting.

3JDK

1) Version before 10.0.17: JDK was not configured in the project (JDK was incorrectly configured in the system)

2) Version 10.0.17 and later:

Non-oracle JRE tools.jar exists in the project, and JDK is not configured in the project (the system configuration JDK is faulty)

tools.jar is not available in the project, and JDK is not configured in the project (JDK is incorrectly configured in the system)


You are advised to configure the system JDK/tools.jar to ensure normal system running.

For details about how to set the JDK and tools.jar, see: Tomcat server deployed independently in Linux.

4Off-heap memoryOut of heap memory

It is strongly recommended that the total memory of the dedicated server minus the memory used by the container where the system resides be at least 10GB.

5Development methodNon-tomcat container deployment

Tomcat container deployment is recommended

For details about Tomcat deployment, see: Tomcat Server Deployment

When the operating system, port, JDK, or off-heap memory affects the use of functions, the system alerts users through platform message and pop-up message in the lower right corner of the platform.

Click "Handle" to jump to the "Downtime Handling" platform configuration page, so as to handle it accordingly.

The content of the message reminder is: the current downtime automatic processing tool is not available. To ensure the normal operation of the function, it is recommended that you click Processing to view the details.

3.png


3. Downtime handling

If use the downtime handling function in Section II.3, must ensure that each item in the 'Running Check 'meets the requirements.

You can configure the following items in the "Downtime Handling" area: Automatic handling of downtime , Automatically export memory stack, Auto restart, Downtime notification, and Port Setting. As shown in the figure below:

All the following settings will take effect only after you click the Save button.

4.png

1) Automatic handling of downtime

When the system configuration meets the running conditions in Running check, the tool automatically starts. This function is enabled by default.

If the system configuration does not meet the operating conditions, the switch icon becomes gray and cannot be modified.

Note: 10.0.17and later versions of engineering,

During the working period (6:00-23:00), when the main process of the report project is shut down for 5 minutes, the automatic downtime handling tool is also shut down.

During non-working hours (0:00-6:00, 23:00-24:00), the main process of the report project is shut down, and the automatic downtime handling tool does not shut down after it.

2) Automatically export memory stack

  • If Automatic handling of downtime is not enabled, this parameter is dimmed and cannot be modified.

  • If  Automatic handling of downtime  is enabled, this function is disabled by default. When downtime occurs, downtime logs are automatically exported to %Tomcat%\logs\FineLog\Date

When this switch is enabled, the system checks the tool status and the current system status. Check whether the system is integrated with BI. If BI is integrated, a message is displayed indicating that this function does not support the BI system. As shown in the figure below:

Click "Complete" or "Close" button and the popover will close, but the switch will not be turned on.

5.png

3) Auto restart

  • If Automatic handling of downtime is not enabled, this parameter is dimmed and cannot be modified.

  • If Automatic handling of downtime is enabled, it is disabled by default. If Automatic handling of downtime is enabled, the system automatically restarts the project.

When this switch is enabled, the system checks the tool status and the current system status.

1) Check whether the system integrates BI. If BI is integrated, a pop-up prompt will be displayed after clicking: This function does not support BI system. As shown in the figure below:

After clicking "Complete" or "Close" button, the popover window will close and the switch will not be turned on.

6.png

2) Check whether the operating system is Windows and whether the service mode is Windows.

If the operating system is Windows and the service mode is used, a message is displayed indicating that: The function is not supported current system. As shown in the figure below:

After clicking "Complete" or "Close" button, the popover window will close and the switch will not be turned on.

7.png

If the operating system is Windows and the service mode is not Windows, a message is displayed indicating that: The current system may fail to restart. As shown in the figure below:

Click the "Complete" or "Close" button to close the popover, and the switch is on.

8.png

If the preceding problems exist at the same time, only the one with the highest importance is displayed in the order of importance: The function is not supported current system > This function does not support BI system > The current system may fail to restart.

4) Downtime notification

If Automatic handling of downtime is not enabled, this option is dimmed and cannot be modified. If Automatic handling of downtime is enabled, this function is disabled by default. After Automatic handling of downtime is enabled, you can configure SMS notification, platform message notification, and email notification. When downtime occurs, users will be notified in the configured notification mode.

5) Port setting

The default port of the downtime handling tool is 12100. If port 12100 is abnormal, you can configure another port number here.

The port number must range from 1024 to 65535. 12100 is recommended. Otherwise, the downtime handling tool cannot start and the downtime handling page cannot be opened.

The error message "Please enter a number between 1024 and 65535, 12100 is recommended." is added for projects of 10.0.17 and later versions. 

9.png

Enter the new port number and click Test. If the new port number is abnormal, the entry fails.

If the port is normal, a message is displayed indicating that: The port is available, automatic downtime handling tool will restart on the new port after saving. As shown in the figure below:

10.png

If there is no abnormality in the port, a pop-up window will prompt: The port is available. After saving, the automatic downtime processing tool will restart on the new port. As shown below:

1618038034MEM9.png


4. Handling record

Memory stack export record and Server restart record record the DUMP file generated automatically and server restart record respectively. The information includes the Export (Restart) start time, Export (Restart) duration, Is it successful, and Reason of failure. As shown in the figure below:

Note: For 10.0.17 and later versions of the project, only the downtime handling records of the last month are read.

11.png

III. Client

After the server project is started as an administrator, the administrator enters port http://IP: (the default port of the downtime automatic handling tool is 12100. If the port has been changed, this port is the changed one) /operation/tool in the browser to access the downtime tool client interface, as shown in the following figure:

12.png

After login, the user can access the user interface using the account and password. After five failed login attempts, the user cannot login to the client within 60 minutes. The login validity period is 15 minutes. You will log out automatically when the login expires.

Note: If the user is locked during the downtime, the user needs to manually kill the process.

The user interface mainly has two functions: Running check and Output stack. In the upper right corner, there is a "Log out" button, which can be clicked to log out. As shown in the figure below:

13.png


1. Running check

Project running check, if the server is running normally, then the screen shows: Normal running.... As shown in the figure below:

14.png

If the breakdown occurs, the screen shows: Has been down, handling.... As shown in the figure below:

14.png

When the tool automatically handles the downtime, you can manually stop the tool from handling the downtime. After the tool stops handling the downtime, a message is displayed, indicating that: The downtime automatic handling has been stopped, please manually restart the system in time.. As shown in the figure below:

15.png

If the page displays "Failed to restart automatically, please manually restart the system in time." when the gadget fails to be restarted. As shown in the figure below:

16.png


2. Output stack

The user can export the thread stack and memory stack file in the "Output stack", click the corresponding location to export. As shown in the figure below:

18.png

After the stack file is exported successfully, a message is displayed indicating that the output is successful. Otherwise, a message is displayed indicating that the output fails. The stack file location is %Tomcat% logs\FineLog\Date.

Attachment List


Theme: Decision-making Platform
Already the First
Already the Last
  • Helpful
  • Not helpful
  • Only read

Doc Feedback