General LLM- FineBI Help Document

Last update：January 28, 2026

Overview

A general large language model (LLM) is used to support the following functions of FineChatBI: data query (semantic understanding and transcription in the intelligent mode), idea asking, and one-click synonym configuration.

Selectable General LLM

The general LLM can be a local LLM or a cloud-based LLM.

通用大模型图1.png

Recommended LLM

Note:

Disclaimer: FanRuan only provides recommended LLMs and connection methods for reference and takes no liability for any issues with the LLMs themselves.
Local LLM deployment: You are advised to use vLLM, rather than Ollama (which may incur system prompt loss issues and affect the LLM response accuracy).

Type	Recommended LLM	GPU Requirement & Cost Evaluation
Local LLM	Qwen3-235B-A22B-Instruct-2507-FP8 (Download Link)	4 x H100 GPU, requiring FP8 to be supported by GPUs
	Qwen/Qwen3-235B-A22B-GPTQ-Int4 (Download Link) Note: It is slightly inferior to non-quantized LLMs in terms of performance.	2 x H100 GPU
	Qwen3-Next-80B-A3B-Instruct-FP8 (Download Link)	2 x H100 GPU
	DeepSeek-V3 (4-bit quantized version)	8 x H100 or higher-performance GPU
Cloud-based LLM	Qwen-Max (32K)	Token cost = (N x X + M x Y) x U x V N: Average number of input tokens consumed per user query; Empirical value: 20000 tokens (Q&A) and 2000 tokens (report generation) M: Average number of output tokens consumed per user query; Empirical value: 1000 tokens (Q&A) and 1500 tokens (report generation) X: LLM cloud service input token pricing, subject to actual rates; Alibaba Cloud price: $1.6/Million tokens (Pricing details) Y: LLM cloud service output token pricing, subject to actual rates; Alibaba Cloud price: $6.4/Million tokens (Pricing details) U: Monthly number of active users V: Average number of common queries per user each month; Empirical value: 20 queries
	DeepSeek-V3/DeepSeek-V3-0324	Token cost = (N x X + M x Y) x U x V N: Average number of input tokens consumed per user query; Empirical value: 20000 tokens (Q&A) and 2000 tokens (report generation) M: Average number of output tokens consumed per user query; Empirical value: 1000 tokens (Q&A) and 1500 tokens (report generation) X: LLM cloud service input token pricing, subject to actual rates; Tencent Cloud price: $0.00465/K tokens (Pricing details) Y: LLM cloud service output token pricing, subject to actual rates; Tencent Cloud price: $0.00687/K tokens (Pricing details) U: Monthly number of active users V: Average number of deep inference queries per user each month; Empirical value: 20 queries
	GPT-4o	Token cost = (N x X + M x Y) x U x V N: Average number of input tokens consumed per user query; Empirical value: 20000 tokens (Q&A) and 2000 tokens (report generation) M: Average number of output tokens consumed per user query; Empirical value: 1000 tokens (Q&A) and 1500 tokens (report generation) X: LLM cloud service input token pricing, subject to actual rates X: LLM cloud service output token pricing, subject to actual rates U: Monthly number of active users V: Average number of deep inference queries per user each month; Empirical value: 20 queries

LLM Capability Check

FineChatBI requires LLMs to strictly output results in the specified format. Otherwise, FineChatBI cannot be used properly.

Before connecting to a local LLM, you need to check whether the LLM meets FineChatBI requirements. For details, see Checking Whether LLMs Meet FineChatBI Capability Requirements.

LLM API Compatibility Check

Note:

Check API compatibility for local LLMs and skip this step for cloud-based LLMs.

You need to check whether your local LLM is compatible with the OpenAI API:

Situation One: Compatible with the OpenAI API

If the LLM is compatible with the OpenAI API, no modification is required. The LLM can be directly connected to the FineAI service.

Situation Two: Incompatible with the OpenAI API

If the LLM is incompatible with the OpenAI API, you are advised to use the vLLM framework to redeploy the local LLM as a service compatible with the OpenAI API. For details, see the official vLLM documentation.

Situation Three: Incompatible with the OpenAI API and Unable to Redeploy the LLM

The API needs to be modified. A resident API forwarding service is required as a bridge for communication between the FineAI service and the LLM service.

The API forwarding service needs to convert the request body of FineAI into one in the format supported by the LLM service, send the request body to the LLM, parse the response body of the LLM, and convert the response body into one in the format supported by FineAI.

Both FineAI's request body and response body must conform to the OpenAI Chat API specification. For details, see the following:

Request URL Specification

The request URL of the LLM needs to end with /chat/completions.

Request Body Specification

The request body needs to contain the following parameters:

Parameter	Description
model	Deployment name on the BI Q&A configuration page.
messages	Conversation history, including characters and history. The type is List[dict(str,str)].
temperature	Randomness of the LLM output results. The larger the value, the stronger the randomness.
max_tokens	Number of generated output tokens.
stream	Whether to stream the output (bool type). The stream sent by FineAI must be transferred to the LLM. Whether the response body is of the streaming format must be determined based on the value of stream.

Request body example:

{
     "model": "gpt-3.5-turbo",
     "messages": [
         {
             "role": "system",
             "content": "You are an arithmetic expert."
         },
         {
             "role": "user",
             "content": "How to calculate pi?"
         }
     ],
     "temperature": 0.95,
     "max_tokens": 8192,
     "stream": false
 }

Response Body Specification

> When stream is set to false for the FineAI request body, a non-streaming response body is returned.

Example of a non-streaming response body:

{
   "choices": [
     {
       "message": {
         "role": "assistant",
         "content": "The following ways to calculate pi are available..."
       },
       "finish_reason": "stop"
     }
   ]
 }

> When stream is set to true for the FineAI request body, a streaming response body is returned.

Note:

The streaming response body must follow the SSE standard.
Except that the JSON structure must meet the following requirements, each response body must start with data:, end with two carriage returns, and have header set to Content-Type: text/event-stream. For details, see the SSE standard.

Example of the JSON part in a streaming response body:

During the generation, each token is returned by content, and the value of finish_reason is null.
After the streaming response ends (namely, the last token is returned), an additional response body is required, where the value of content is null and the value of finish_reason is stop.

{
     "choices": [
         {
             "finish_reason": null,
             "delta": {
                 "content": "Okay"
             }
         }
     ]
 }

Whitelist Configuration

To ensure that FineBI/FineAI can access the LLM, the LLM's address must be added to the whitelist of the FineBI/FineAI server.

Configuring the LLM for FineChatBI

Connecting to the Local LLM

Choose Intelligent Q&A Configuration > Other Configurations > LLM Configuration, configure the local service information, and click Save, as shown in the following figure.

通用大模型图2.png

Item	Description
ApiKey	UUID for identity authentication, which is usually generated by the service provider. (If the API requires no authentication, leave this item empty.)
endPoint	Specific LLM service address, through which you can interact with the LLM. Enter the base URL, namely, one not suffixed with /chat/completions.
Model to Deploy	Name of the model to be connected.

Item

Description

ApiKey

UUID for identity authentication, which is usually generated by the service provider.

(If the API requires no authentication, leave this item empty.)

endPoint

Specific LLM service address, through which you can interact with the LLM.

Enter the base URL, namely, one not suffixed with /chat/completions.

Model to Deploy

Name of the model to be connected.

Connecting to the Cloud-based LLM

(1) Enable external access permissions for FineAI and FineBI servers, and add the LLM service address to the whitelist of both FineBI and FineAI servers.

(2) Choose Intelligent Q&A Configuration > Other Configurations, toggle-on LLM, and enter the information related to the LLM service.

通用大模型图3.png

Configuration Item	Description
Service Provider Name	LLM service provider, which can be set to Azure, OpenAI, or DeepSeek.
ApiKey	UUID for identity authentication, which is usually generated by the service provider. (If the API requires no authentication, leave this item empty.)
endPoint	Specific LLM service address, through which you can interact with the LLM.
Model to Deploy	Name of the model to be connected.

Taking DeepSeek as an example, you need to enter the following content when connecting to the official DeepSeek API:

(1) Set Service Provider Name to deepseek.

(2) Enter your own API key.

(3) Set endPoint to https://api.deepseek.com.

(4) Set Model to Deploy to deepseek-chat (recommended) or deepseek-reasoner.

通用大模型图4.png

Helpful
Not helpful
Only read

中文（简体）中文（繁體）日本語

English

General LLM

Overview

Selectable General LLM

Recommended LLM

LLM Capability Check

LLM API Compatibility Check

Request URL Specification

Request Body Specification

Response Body Specification

Whitelist Configuration

Configuring the LLM for FineChatBI

Connecting to the Local LLM

Connecting to the Cloud-based LLM

附件列表