Reasoning LLM- FineBI Help Document

Last update：December 03, 2025

Overview

A general large language model (LLM) is used to support the following functions of FineChatBI: data interpretation and attribution analysis.

If resources are limited, a general LLM can be directly used with, however, compromised effects.

Selectable Reasoning LLM

The reasoning LLM can be a local LLM or a cloud-based LLM.

通用大模型图1.png

Recommended LLM

Note:

Disclaimer: FanRuan only provides recommended LLMs and connection methods for reference and takes no liability for any issues with the LLMs themselves.

Local LLM deployment: You are advised to use vLLM, rather than Ollama (which may incur system prompt loss issues and affect the LLM response accuracy).

Local LLM

Type	Recommended LLM	GPU Requirement (For Reference Only; LLM Vendor Consultancy Recommended)
Local LLM	DeepSeek-R1 (Full Version)	16 x H100 GPU
	DeepSeek-R1-0528	16 x H100 GPU
	Qwen3-235B-A22B-Thinking-2507-FP8 (Download Link)	4 x H100 GPU

Cloud-based LLM

Type	Recommended LLM	Cost Evaluation
Cloud-based LLM	DeepSeek-R1/DeepSeek-R1-0528	Token cost = (N x X + M x Y) x U x V N: Average number of input tokens consumed per user query; Empirical value: 20000 tokens (Q&A) and 2000 tokens (report generation) M: Average number of output tokens consumed per user query; Empirical value: 2200 tokens (Q&A) and 2000 tokens (report generation) X: LLM cloud service input token pricing, subject to actual rates; Tencent Cloud price: $0.01051/K tokens (Pricing details) Y: LLM cloud service output token pricing, subject to actual rates; Tencent Cloud price: $0.03032/K tokens (Pricing details) U: Monthly number of active users V: Average number of deep inference (data interpretation + attribution analysis) queries per user each month; Empirical value: 5 queries

Type

Recommended LLM

Cost Evaluation

Cloud-based LLM

DeepSeek-R1/DeepSeek-R1-0528

Token cost = (N x X + M x Y) x U x V

N: Average number of input tokens consumed per user query; Empirical value: 20000 tokens (Q&A) and 2000 tokens (report generation)

M: Average number of output tokens consumed per user query; Empirical value: 2200 tokens (Q&A) and 2000 tokens (report generation)

X: LLM cloud service input token pricing, subject to actual rates; Tencent Cloud price: $0.01051/K tokens (Pricing details)

Y: LLM cloud service output token pricing, subject to actual rates; Tencent Cloud price: $0.03032/K tokens (Pricing details)

U: Monthly number of active users

V: Average number of deep inference (data interpretation + attribution analysis) queries per user each month; Empirical value: 5 queries

LLM Capability Check

FineChatBI requires LLMs to strictly output results in the specified format. Otherwise, FineChatBI cannot be used properly.

Before connecting to a local LLM, you need to check whether the LLM meets FineChatBI requirements. For details, see Checking Whether LLMs Meet FineChatBI Capability Requirements.

LLM API Compatibility Check

Note:

Check API compatibility for local LLMs and skip this step for cloud-based LLMs.

You need to check whether your local LLM is compatible with the OpenAI API:

Situation One: Compatible with the OpenAI API

If the LLM is compatible with the OpenAI API, no modification is required. The LLM can be directly connected to the FineAI service.

Situation Two: Incompatible with the OpenAI API

If the LLM is incompatible with the OpenAI API, you are advised to use the vLLM framework to redeploy the local LLM as a service compatible with the OpenAI API. For details, see the official vLLM documentation.

Situation Three: Incompatible with the OpenAI API and Unable to Redeploy the LLM

The API needs to be modified. A resident API forwarding service is required as a bridge for communication between the FineAI service and the LLM service.

The API forwarding service needs to convert the request body of FineAI into one in the format supported by the LLM service, send the request body to the LLM, parse the response body of the LLM, and convert the response body into one in the format supported by FineAI.

Both FineAI's request body and response body must conform to the OpenAI Chat API specification. For details, see the following:

Request URL Specification

The request URL of the LLM needs to end with /chat/completions.

Request Body Specification

The request body needs to contain the following parameters:

Parameter	Description
model	Deployment name on the BI Q&A configuration page.
messages	Conversation history, including characters and history. The type is List[dict(str,str)].
temperature	Randomness of the LLM output results. The larger the value, the stronger the randomness.
max_tokens	Number of generated output tokens.
stream	Whether to stream the output (bool type). The stream sent by FineAI must be transferred to the LLM. Whether the response body is of the streaming format must be determined based on the value of stream.

Request body example:

{
      "model": "gpt-3.5-turbo",
      "messages": [
          {
              "role": "system",
              "content": "You are an arithmetic expert."
          },
          {
              "role": "user",
              "content": "How to calculate pi?"
          }
      ],
      "temperature": 0.95,
      "max_tokens": 8192,
      "stream": false
  }

Response Body Specification

> When stream is set to false for the FineAI request body, a non-streaming response body is returned.

Example of a non-streaming response body:

{
    "choices": [
      {
        "message": {
          "role": "assistant",
          "content": "The following ways to calculate pi are available..."
        },
        "finish_reason": "stop"
      }
    ]
  }

> When stream is set to true for the FineAI request body, a streaming response body is returned.

Note:

The streaming response body must follow the SSE standard.

Except that the JSON structure must meet the following requirements, each response body must start with data:, end with two carriage returns, and have header set to Content-Type: text/event-stream. For details, see the SSE standard.

Example of the JSON part in a streaming response body:

During the generation, each token is returned by content, and the value of finish_reason is null.
After the streaming response ends (namely, the last token is returned), an additional response body is required, where the value of content is null and the value of finish_reason is stop.

{
      "choices": [
          {
              "finish_reason": null,
              "delta": {
                  "content": "Okay"
              }
          }
      ]
  }

Whitelist Configuration

To ensure that FineBI/FineAI can access the LLM, the LLM's address must be added to the whitelist of the FineBI/FineAI server.

Configuring the Reasoning LLM for FineChatBI

Connecting to the Local LLM

Choose Intelligent Q&A Configuration > Other Configurations, and toggle-on LLM, as shown in the following figure.

推理大模型图2.png

Configure the information related to the local reasoning LLM, and click Save, as shown in the following figure.

推理大模型图3.png

Item	Description
ApiKey	UUID for identity authentication, which is usually generated by the service provider. (If the API requires no authentication, leave this item empty.)
endPoint	Specific LLM service address, through which you can interact with the LLM. Enter the base URL, namely, one not suffixed with /chat/completions.
Model to Deploy	Name of the model to be connected.

Item

Description

ApiKey

UUID for identity authentication, which is usually generated by the service provider.

(If the API requires no authentication, leave this item empty.)

endPoint

Specific LLM service address, through which you can interact with the LLM.

Enter the base URL, namely, one not suffixed with /chat/completions.

Model to Deploy

Name of the model to be connected.

Connecting to the Cloud-based LLM

Choose Intelligent Q&A Configuration > Other Configurations, and toggle-on LLM. Enter the information related to the cloud-based reasoning LLM, as shown in the following figure.

推理大模型图4.png

Configuration Item	Description
Service Provider Name	deepseek (LLM compliant with OpenAI API specifications)
ApiKey	UUID for identity authentication, which is usually generated by the service provider. (If the API requires no authentication, leave this item empty.)
endPoint	Specific LLM service address, through which you can interact with the LLM.
Model to Deploy	Name of the model to be connected.

Helpful
Not helpful
Only read

中文（简体）中文（繁體）日本語

English

Reasoning LLM

Overview

Selectable Reasoning LLM

Recommended LLM

LLM Capability Check

LLM API Compatibility Check

Whitelist Configuration

Configuring the Reasoning LLM for FineChatBI

Connecting to the Local LLM

Connecting to the Cloud-based LLM

附件列表