A general large language model (LLM) is used to support the following functions of FineChatBI: data query (semantic understanding and transcription in the intelligent mode), idea asking, and one-click synonym configuration.
The general LLM can be a local LLM or a cloud-based LLM.
Disclaimer: FanRuan only provides recommended LLMs and connection methods for reference and takes no liability for any issues with the LLMs themselves.
Local LLM deployment: You are advised to use vLLM, rather than Ollama (which may incur system prompt loss issues and affect the LLM response accuracy).
Local LLM
Qwen3-235B-A22B-Instruct-2507-FP8 (Download Link)
4 x H100 GPU, requiring FP8 to be supported by GPUs
Qwen/Qwen3-235B-A22B-GPTQ-Int4 (Download Link)
2 x H100 GPU
Qwen3-Next-80B-A3B-Instruct-FP8 (Download Link)
DeepSeek-V3 (4-bit quantized version)
8 x H100 or higher-performance GPU
Cloud-based LLM
Qwen-Max (32K)
Token cost = (N x X + M x Y) x U x V
N: Average number of input tokens consumed per user query; Empirical value: 20000 tokens (Q&A) and 2000 tokens (report generation)
M: Average number of output tokens consumed per user query; Empirical value: 1000 tokens (Q&A) and 1500 tokens (report generation)
X: LLM cloud service input token pricing, subject to actual rates; Alibaba Cloud price: $1.6/Million tokens (Pricing details)
Y: LLM cloud service output token pricing, subject to actual rates; Alibaba Cloud price: $6.4/Million tokens (Pricing details)
U: Monthly number of active users
V: Average number of common queries per user each month; Empirical value: 20 queries
DeepSeek-V3/DeepSeek-V3-0324
X: LLM cloud service input token pricing, subject to actual rates; Tencent Cloud price: $0.00465/K tokens (Pricing details)
Y: LLM cloud service output token pricing, subject to actual rates; Tencent Cloud price: $0.00687/K tokens (Pricing details)
V: Average number of deep inference queries per user each month; Empirical value: 20 queries
GPT-4o
X: LLM cloud service input token pricing, subject to actual rates
X: LLM cloud service output token pricing, subject to actual rates
FineChatBI requires LLMs to strictly output results in the specified format. Otherwise, FineChatBI cannot be used properly.
Before connecting to a local LLM, you need to check whether the LLM meets FineChatBI requirements. For details, see Checking Whether LLMs Meet FineChatBI Capability Requirements.
You need to check whether your local LLM is compatible with the OpenAI API:
Situation One: Compatible with the OpenAI API
If the LLM is compatible with the OpenAI API, no modification is required. The LLM can be directly connected to the FineAI service.
Situation Two: Incompatible with the OpenAI API
If the LLM is incompatible with the OpenAI API, you are advised to use the vLLM framework to redeploy the local LLM as a service compatible with the OpenAI API. For details, see the official vLLM documentation.
Situation Three: Incompatible with the OpenAI API and Unable to Redeploy the LLM
The API needs to be modified. A resident API forwarding service is required as a bridge for communication between the FineAI service and the LLM service.
The API forwarding service needs to convert the request body of FineAI into one in the format supported by the LLM service, send the request body to the LLM, parse the response body of the LLM, and convert the response body into one in the format supported by FineAI.
Both FineAI's request body and response body must conform to the OpenAI Chat API specification. For details, see the following:
The request URL of the LLM needs to end with /chat/completions.
The request body needs to contain the following parameters:
Deployment name on the BI Q&A configuration page.
messages
Conversation history, including characters and history. The type is List[dict(str,str)].
temperature
Randomness of the LLM output results. The larger the value, the stronger the randomness.
max_tokens
Number of generated output tokens.
stream
Whether to stream the output (bool type).
The stream sent by FineAI must be transferred to the LLM.
Whether the response body is of the streaming format must be determined based on the value of stream.
Request body example:
{ "model": "gpt-3.5-turbo", "messages": [ { "role": "system", "content": "You are an arithmetic expert." }, { "role": "user", "content": "How to calculate pi?" } ], "temperature": 0.95, "max_tokens": 8192, "stream": false }
> When stream is set to false for the FineAI request body, a non-streaming response body is returned.
Example of a non-streaming response body:
{ "choices": [ { "message": { "role": "assistant", "content": "The following ways to calculate pi are available..." }, "finish_reason": "stop" } ] }
> When stream is set to true for the FineAI request body, a streaming response body is returned.
Example of the JSON part in a streaming response body:
During the generation, each token is returned by content, and the value of finish_reason is null.
After the streaming response ends (namely, the last token is returned), an additional response body is required, where the value of content is null and the value of finish_reason is stop.
{ "choices": [ { "finish_reason": null, "delta": { "content": "Okay" } } ] }
To ensure that FineBI/FineAI can access the LLM, the LLM's address must be added to the whitelist of the FineBI/FineAI server.
Choose Intelligent Q&A Configuration > Other Configurations > LLM Configuration, configure the local service information, and click Save, as shown in the following figure.
ApiKey
UUID for identity authentication, which is usually generated by the service provider.
(If the API requires no authentication, leave this item empty.)
endPoint
Specific LLM service address, through which you can interact with the LLM.
Enter the base URL, namely, one not suffixed with /chat/completions.
Model to Deploy
Name of the model to be connected.
(1) Enable external access permissions for FineAI and FineBI servers, and add the LLM service address to the whitelist of both FineBI and FineAI servers.
(2) Choose Intelligent Q&A Configuration > Other Configurations, toggle-on LLM, and enter the information related to the LLM service.
Service Provider Name
LLM service provider, which can be set to Azure, OpenAI, or DeepSeek.
Taking DeepSeek as an example, you need to enter the following content when connecting to the official DeepSeek API:
(1) Set Service Provider Name to deepseek.
(2) Enter your own API key.
(3) Set endPoint to https://api.deepseek.com.
(4) Set Model to Deploy to deepseek-chat (recommended) or deepseek-reasoner.
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy