A general large language model (LLM) is used to support the following functions of FineChatBI: data interpretation and attribution analysis.
If resources are limited, a general LLM can be directly used with, however, compromised effects.
The reasoning LLM can be a local LLM or a cloud-based LLM.
Disclaimer: FanRuan only provides recommended LLMs and connection methods for reference and takes no liability for any issues with the LLMs themselves.
Local LLM deployment: You are advised to use vLLM, rather than Ollama (which may incur system prompt loss issues and affect the LLM response accuracy).
Local LLM
GPU Requirement (For Reference Only; LLM Vendor Consultancy Recommended)
Cloud-based LLM
DeepSeek-R1/DeepSeek-R1-0528
Token cost = (N x X + M x Y) x U x V
N: Average number of input tokens consumed per user query; Empirical value: 20000 tokens (Q&A) and 2000 tokens (report generation)
M: Average number of output tokens consumed per user query; Empirical value: 2200 tokens (Q&A) and 2000 tokens (report generation)
X: LLM cloud service input token pricing, subject to actual rates; Tencent Cloud price: $0.01051/K tokens (Pricing details)
Y: LLM cloud service output token pricing, subject to actual rates; Tencent Cloud price: $0.03032/K tokens (Pricing details)
U: Monthly number of active users
V: Average number of deep inference (data interpretation + attribution analysis) queries per user each month; Empirical value: 5 queries
FineChatBI requires LLMs to strictly output results in the specified format. Otherwise, FineChatBI cannot be used properly.
Before connecting to a local LLM, you need to check whether the LLM meets FineChatBI requirements. For details, see Checking Whether LLMs Meet FineChatBI Capability Requirements.
You need to check whether your local LLM is compatible with the OpenAI API:
Situation One: Compatible with the OpenAI API
If the LLM is compatible with the OpenAI API, no modification is required. The LLM can be directly connected to the FineAI service.
Situation Two: Incompatible with the OpenAI API
If the LLM is incompatible with the OpenAI API, you are advised to use the vLLM framework to redeploy the local LLM as a service compatible with the OpenAI API. For details, see the official vLLM documentation.
Situation Three: Incompatible with the OpenAI API and Unable to Redeploy the LLM
The API needs to be modified. A resident API forwarding service is required as a bridge for communication between the FineAI service and the LLM service.
The API forwarding service needs to convert the request body of FineAI into one in the format supported by the LLM service, send the request body to the LLM, parse the response body of the LLM, and convert the response body into one in the format supported by FineAI.
Both FineAI's request body and response body must conform to the OpenAI Chat API specification. For details, see the following:
Request URL Specification
The request URL of the LLM needs to end with /chat/completions.
Request Body Specification
The request body needs to contain the following parameters:
model
Deployment name on the BI Q&A configuration page.
messages
Conversation history, including characters and history. The type is List[dict(str,str)].
temperature
Randomness of the LLM output results. The larger the value, the stronger the randomness.
max_tokens
Number of generated output tokens.
Whether to stream the output (bool type).
The stream sent by FineAI must be transferred to the LLM.
Whether the response body is of the streaming format must be determined based on the value of stream.
Request body example:
{ "model": "gpt-3.5-turbo", "messages": [ { "role": "system", "content": "You are an arithmetic expert." }, { "role": "user", "content": "How to calculate pi?" } ], "temperature": 0.95, "max_tokens": 8192, "stream": false }
Response Body Specification
> When stream is set to false for the FineAI request body, a non-streaming response body is returned.
Example of a non-streaming response body:
{ "choices": [ { "message": { "role": "assistant", "content": "The following ways to calculate pi are available..." }, "finish_reason": "stop" } ] }
> When stream is set to true for the FineAI request body, a streaming response body is returned.
The streaming response body must follow the SSE standard.
Except that the JSON structure must meet the following requirements, each response body must start with data:, end with two carriage returns, and have header set to Content-Type: text/event-stream. For details, see the SSE standard.
Example of the JSON part in a streaming response body:
During the generation, each token is returned by content, and the value of finish_reason is null.
After the streaming response ends (namely, the last token is returned), an additional response body is required, where the value of content is null and the value of finish_reason is stop.
{ "choices": [ { "finish_reason": null, "delta": { "content": "Okay" } } ] }
To ensure that FineBI/FineAI can access the LLM, the LLM's address must be added to the whitelist of the FineBI/FineAI server.
Choose Intelligent Q&A Configuration > Other Configurations, and toggle-on LLM, as shown in the following figure.
Configure the information related to the local reasoning LLM, and click Save, as shown in the following figure.
ApiKey
UUID for identity authentication, which is usually generated by the service provider.
(If the API requires no authentication, leave this item empty.)
endPoint
Specific LLM service address, through which you can interact with the LLM.
Enter the base URL, namely, one not suffixed with /chat/completions.
Model to Deploy
Name of the model to be connected.
Choose Intelligent Q&A Configuration > Other Configurations, and toggle-on LLM. Enter the information related to the cloud-based reasoning LLM, as shown in the following figure.
Service Provider Name
deepseek (LLM compliant with OpenAI API specifications)
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy