Overview
This document introduces:
(1) FineBI 6.1 architecture
(2) FineBI 6.1 advantage
FineBI 6.1 Project Architecture
Architecture Overview
Component Overview
Component | Overview |
---|---|
Load balancing gateway | Function: The load balancing gateway is located between the client and the bi-web component. It receives requests from the client and distributes these requests to the bi-web component. Therefore, load balancing requires accessibility to each BI node. Advantage (Why load balancing? Why not direct connection from the client to BI?) Performance optimization: Requests are intelligently allocated to various BI nodes to prevent server overload and improve efficiency. Fault isolation: Faulty BI nodes (if any) are automatically detected and troubleshot to prevent overall services from being affected. Security enhancement: As the first line of defense, the load balancing gateway filters and monitors traffic to prevent malicious attacks. Session persistence: The load balancing gateway allows you to configure session persistence policies to keep sessions alive by ensuring user request continuity. Scalability: All the client needs is just the load balancing gateway address. Even when more BI nodes are added due to business growth, the user access address is not affected. Type: This FanRuan intra-project gateway is customized based on FanRuan business to balance user request distribution and improve performance. Therefore, self-provided gateways are not supported. If you need to use other types of load balancing gateways such as F5, SLB, and ELB, you can configure forwarding by yourself so that client requests can be forwarded to your self-provided gateway first, then the FanRuan intra-project gateway, and last various BI nodes. |
BI node | BI nodes are located at two layers: network forwarding layer and business processing layer. Network forwarding layer (How can business of one user be continuously processed by the same node?) To ensure that one user's requests can be continuously processed by the same BI node if multiple BI nodes exist, a unique session ID will be generated when the client and the BI node is connected. In this case, regardless of which BI node subsequent requests from this client are forwarded to, these requests will be re-forwarded to the BI node (that first received the requests) based on the session ID, so that sessions can be alive and services can be consistent. Business processing layer: BI nodes at this layer are used to process client requests, including platform function requests and frontend page rendering requests. During request processing, tasks involved with any data update or calculation will be intelligently assigned to the backend engine worker. |
Engine master | The engine master has two functions: storing/providing information about worker and storing metadata. Storing/Providing information about worker (Q: Are tasks of BI nodes directly assigned to worker? A: Yes, but not entirely.) When data in a BI node needs to be updated and calculated, these tasks will be intelligently assigned to the engine worker for execution. However, the engine worker provides stateless services, which means that worker is not responsible for storing any persistent data (similar to the situation that worker does not have a fixed ID card). Therefore, the BI node cannot directly identify or contact any engine worker. In this case, the BI node will first communicate with the engine master. Then the engine master dynamically provides an available address of worker (randomly assigned and specifically used to complete the current task) for the BI node. Once obtaining the address of the engine worker from the engine master, the BI node can smoothly distribute tasks to the corresponding engine worker, ensuring the smoothness of data processing and calculation. Storing metadata (How does worker retrieve data from the data storage component?) Metadata, also known as the information of each BI data table (not the data in each BI data table), such as the table name, field name, and table storage location in the data storage component. The metadata is stored in the directory where the component of master is mounted, namely, the server's disk. Before retrieving data from the data storage component for calculation/update, the worker needs to first obtain the metadata of these data tables from the master, then correctly locate the required tables, and finally complete the work. |
Engine worker | Two parts, monitor for health check and engine worker, are involved in this component. Engine worker: Such engine is used to execute data update and calculation tasks distributed by BI nodes. By default, each engine worker can execute both data update and data query/calculation tasks. What is read-write separation? When to perform configuration? In a formal business system, business data is generally queried/calculated during working hours in the daytime, and updated during off-hours at night. In the daytime, data update (if any) cannot occupy the resources for query. At night, data calculation (if any) cannot occupy the resources for update. In this case, you can configure attributes for the engine worker to allow certain nodes to focus on a specific task in a specified time period (namely, read-write separation). monitor (Why is a requirement posed on the restart order of the master and worker?) An engine worker will regularly (every 3 seconds) send a heartbeat signal (indicating that it is still alive) to the engine master. If the sending is abnormal, the master will consider that the worker is faulty and needs to be restarted. The master will send a task to the monitor to execute the kill command to forcefully close the worker and then close the monitor itself. In this way, the mechanism of the worker itself will be triggered to automatically restart itself. Therefore, if the master is restarted successfully as required, all worker engines need to be restarted. Otherwise, the master will not be able to check whether each worker is alive. |
Data storage | Such component is used to store data extracted from base tables and self-service datasets in FineBI. Note that the following data is not included in the data storage component: Excel dataset: The related files are stored in the /WEB-INF/assets/temp_attach path on the file server. Direct-connected data: Cached data related to direct connection is jointly stored by the server memory and disk. |
Configuration database | The configuration database, also known as FineDB in earlier versions, is used to store configuration information (for example, which users/directories are in the project, what permissions do users have, and when data update tasks are executed) in the project. The configuration information is stored separately in a database and constantly connected with the project to ensure long-term stable project running. This is why each project must have its own configuration database, rather than share one with other projects. Sharing may cause configuration confusion. Example: For a cluster with multiple BI business nodes, the same platform style and the same directory are displayed whether you access the project from a load balancing entry or a specific node. This is because each node reads related information from the configuration database and displays such information. |
State service | The state server is a monitor that monitors the running state of each component in the entire BI project and each node, records logs and errors, and coordinates inter-node communication and task allocation. Example: Assume that a user is logged in at node A and processes business normally. But because node A has crashed at this time, the business has been transferred to node B for processing. How does node B know that this user is logged in on the current computer? This relies on the state service to determine the login state. Such information is stored in Redis. |
File service | The file server is used to store and share the files and data resources required in the cluster to ensure that each BI node can access and use them. The following lists some files to help understand the role of the file server: FineReport template file FineReport template backup file FineBI's original Excel file information Driver uploaded by Driver Management Project resource file (map/image) Snapshot file generated by Task Schedule Data package generated by Cloud O&M Historical project backup file |
Log service | The log service is used to record every operation of each user in the system, including login, data access, and modification. In FineBI 6.1, the Elasticsearch component replacing the original engine Swift provides log services. Why logs are required: Traceability: Logs can ensure the transparency and traceability of system operations, which is very important for audits and compliance checks. Performance analysis: The running and performance data of the system are recorded to help analyze and evaluate the system performance bottlenecks. Behavior analysis: User behavior habits and usage patterns are learned to help improve system functionality and user experience. |
Advantage Comparison Between FineBI 6.1 and Earlier Versions
Item | 6.0/5.x | Storage and Calculation Separation in 6.1 |
---|---|---|
Higher stability | The calculation engine and BI business are in the same process. The user access and query/update affect each other. If a crash occurs, the entire service crashes, which will be perceived by users. | The engine service is separated from the BI business service. The engine is separated into an independent process. In this case, engine crashes (if any) will not be perceived by users. The engine can be restarted automatically within minutes. This can meet the high availability requirements when multiple engine nodes exist. |
Higher scalability | 5.x: For extracted data, primary and backup clusters can be achieved through plugins, which cannot ensure high availability. Data can be stored only on local disks, which cannot meet users' personalized storage requirements. 6.0.x: Up to five business nodes are supported, restricting the cluster in node expansion. Data can be stored only on local disks, which cannot meet users' personalized storage requirements. | The unified data access layer is added to separate the calculation engine from the data storage. This breaks through the historical bottleneck that local extracted data is persistently synchronized in the multi-node cluster. No longer subject to the node quantity limit, the calculation engine can be expanded horizontally, improving data query concurrency linearly and meeting the requirements of users with high concurrent queries. Extracted data can be stored in OSS, achieving data persistence without local storage and meeting enterprise data security requirements. |
Higher performance | Complex daily O&M
Tight disk usage
| Simplified daily O&M Basic FineBI O&M operations, such as project start/stop, log download, and stack dump generation, can be completed through the frontend visualization on the O&M platform, effectively reducing partial O&M costs. Disk space release Extracted data is no longer synchronized and stored among multiple nodes. After data is persisted once in the data storage component, multiple engine nodes can invoke such data. |
Reason for Containerized Deployment of FineBI 6.1
Advantage | Description |
---|---|
Consistency |
|
Isolation and security |
|
Simplified O&M |
|
Fast fault recovery |
|
Attachment: Containerization Technology Overview
If Linux is compared to a kitchen, the Docker technology is like pre-cooked meals.
The kitchen can store vacuum-packed food (image) ordered from the central kitchen (cloud image repository) in its own freezer (local image repository).
Chefs (O&M personnel) can transform the pre-cooked vacuum-packed food (image) into various dishes (container) through simple operations, for customers to enjoy (service).
Three major features of Docker
(1) Image: Similar to an installation package, an image is used to create containers.
(2) Container: The container technology can be used to run one application or a group of applications independently. Multiple containers can be created through the image. A container can be considered as a simplified Linux system.
(3) Repository: As a place to store images, repositories can be divided into public repositories and private repositories, such as Docker Hub, Alibaba Cloud, and Harbor.
Through Docker, enterprises can manage their systems more efficiently and flexibly, improving service stability and maintainability.