V2.12.0
Most enterprises have clear requirements for the stability of their business systems. O&M teams leverage system availability indicators to assess the quality of services provided by these systems.
System availability directly impacts user experience. If system availability falls short of expectations during specific periods or if frequent downtime occurs, it not only affects end-user satisfaction but may also lead to business interruptions and customer loss.
So, how can O&M teams effectively monitor system availability to ensure both the stability of business systems and user satisfaction?
FineOps provides an Availability Statistics function to help address this challenge, by which O&M teams can:
1. Define and calculate system availability: It enables the definition and calculation of system availability, offering availability rates across different time periods to help teams gain comprehensive insight into the system operation.
2. Configure shutdown plans: You can create shutdown plans to distinguish scheduled shutdowns and crashes for accurate downtime statistics, enabling a more holistic assessment of system stability.
3. Obtain detailed downtime records: FineOps displays detailed downtime records, assisting O&M teams in identifying potential issues and vulnerabilities for timely adjustments and maintenance.
With these functions, O&M teams can effectively monitor the system or device operation, promptly identify and resolve issues, and ultimately improve system availability and user satisfaction.
System availability is determined from O&M project monitoring data collected by FineOps.
Therefore, to obtain accurate system availability indicators, you must ensure that FineOps' O&M project monitoring is functioning properly.
For details, see Prerequisites of Using Monitoring Dashboards.
Definition
System availability indicator = ∑Available time slices of the system/ (∑Available time slices of the system + ∑Unplanned unavailable time slices of the system)
Available time slices of the system: periods during which FineOps' monitoring agent is alive and successfully collects monitoring data from the O&M project
Unplanned unavailable time slices of the system: two calculation methods available
Definition: Unavailable period – (Scheduled downtime ∩ Unavailable period)
Example:
Assume that scheduled downtime is from 3:00 to 4:00, and the actual unavailable periods are from 1:00 to 2:00 and from 2:30 to 3:30.
In this case, the periods from 1:00 to 2:00 and from 2:30 to 3:00 are counted as unplanned downtime.
Unavailable period: A period is identified as unavailable if during which:
The FineOps monitoring agent is alive, but fails to collect monitoring data from the O&M project for more than 3 minutes.
The load score of components ( FineReport, FineDataLink, FineBI - Application Node, Engine - Calculation Node, or Engine - Metadata Node) exceeds 100 after five consecutive Full GC events.
The FineReport, FineDataLink, FineBI - Application Node, Engine - Calculation Node, or Engine - Metadata Node component disappears.
Scheduled downtime ∩ Unavailable period:
It refers to the intersection of the scheduled downtime window and unavailable periods.
Definition: unavailable periods caused by a system crash
A node will be identified as experiencing a crash if any of the following occurs:
Setting Method
1. Log in to FineOps as the admin, click an O&M project, and choose Availability Statistics > Availability Indicator.
2. Click the icon. You can customize the calculation logic of Unplanned Unavailable Time Slices of the System, which defaults to All Unplanned Stops.
3. Click OK to apply the change.
2. For multi-node O&M projects, you can switch the application node in the top-right corner to view its availability indicators. By default, the overall indicators of the entire project are displayed.
3. Availability indicators of four periods are displayed, including Availability Indicator of Yesterday, Availability Indicator of Last 7 Days, Availability Indicator of Last 30 Days, and Availability Indicator of Last Year.
4. Clicking the availability indicator of any period will update data in the availability column chart below accordingly, allowing you to view availability indicator details in the selected period.
To view downtime records of an O&M project, you must ensure the project is available.
Otherwise, the message will be displayed: "Abnormal project status. Use this project after restoring it or view other projects."
The Downtime Record list displays detailed downtime entries within the selected time period.
2. You can select the query period in the upper right corner of the Downtime Record module. By default, records from the past week are displayed.
2. For multi-node O&M projects, you can switch the application node to view its downtime records. By default, the downtime records of the entire project are displayed.
4. All downtime records within the selected period are displayed in a detail table format.
Stop Time
Downtime occurs in the following three scenarios:
Planned Shutdown: It refers to shutdowns caused by plans created in Shutdown Plan. Hover over it to view the plan details.
System Crash: FineOps has identified a project crash. Common causes and recommended solutions are listed in the following table.
Unplanned Stop: FineOps has detected abnormal monitoring data collection for the project, but has not identified a crash. Hover over it, and you can create a shutdown plan.
It indicates whether the strategy (that enables automatic restarts upon crashes) set in Crash Handling Strategy is triggered.
You can click No (if any) to view specific reasons.
The following table describes the common crash causes and recommended solutions.
Overflow errors caused by insufficient memory
You can view the specific templates that caused the issue, including:
Aborted templates
Templates that consume excessively high memory for calculation
Templates with excessively long calculations
Templates with excessively long SQL statement execution
Templates with dataset row counts exceeding the recommended limit
Templates with cell counts exceeding the recommended limit
1. Troubleshoot the template performance and optimize the content.
For details about template performance, see Template Performance.
2. Enable template limits and configure reasonable limit ranges.
For details, see Template Limit.
3. You are advised to use the System Inspection function to diagnose the FanRuan application. If the on-heap memory size of the current system is less than the recommended value, set the on-heap memory size to the recommended value.
For details about system inspection, see System Inspection.
You are advised to use the System Inspection function to diagnose the FanRuan application and set the memory size to the recommended value.
Check server disk space and use the Resource Cleanup function to remove unnecessary files.
For details about resource cleanup, see Resource Cleanup.
Use the System Inspection function to diagnose the FanRuan application and set the vm.max_map_count configuration to the recommended value.
You are advised to upgrade the FanRuan application to the latest minor version.
For upgrades of FineOps-deployed projects, see Extranet-Based O&M Project Upgrade.
For FineBI updates and upgrades, see FineBI Version Upgrade Overview.
For FineReport updates and upgrades, see FineReport Upgrade/Update Instruction.
For FineDataLink updates and upgrades, see Minor Version Upgrade Instruction for V4.2.x.
If an application is started in an SSH terminal session, closing that terminal will also terminate the application.
You are advised to use alternative command-line remote tools, such as SecureCRT, or configure the FineDataLink server to start automatically upon system boot.
For details, see Automatic Tomcat Startup in Windows Upon System Boot.
You are advised to improve data-fetching performance by using functions such as Extracted Data Cache, SQL Optimization, and Data Preprocessing.
You are advised to adjust the log output level to reduce log volume or check if the disk is running out of space.
For details about log levels, see Log Introduction.
1. You are advised to use the System Inspection function to diagnose the FanRuan application. If the on-heap memory size of the current system is unreasonable, set the on-heap memory size to the recommended value.
2. You are advised to use a CPU with higher performance.
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy