Reasoning LLM

  • Last update:December 03, 2025
  • Overview

    A general large language model (LLM) is used to support the following functions of FineChatBI: data interpretation and attribution analysis.

    If resources are limited, a general LLM can be directly used with, however, compromised effects.

    Selectable Reasoning LLM

    The reasoning LLM can be a local LLM or a cloud-based LLM.

    通用大模型 图1.png

    Recommended LLM

    iconNote:

    Disclaimer: FanRuan only provides recommended LLMs and connection methods for reference and takes no liability for any issues with the LLMs themselves.

    Local LLM deployment: You are advised to use vLLM, rather than Ollama (which may incur system prompt loss issues and affect the LLM response accuracy).

    • Local LLM


    TypeRecommended LLM

    GPU Requirement (For Reference Only; LLM Vendor Consultancy Recommended)

    Local LLMDeepSeek-R1 (Full Version)16 x H100 GPU
    DeepSeek-R1-052816 x H100 GPU
    Qwen3-235B-A22B-Thinking-2507-FP8 (Download Link)4 x H100 GPU
    • Cloud-based LLM

    TypeRecommended LLMCost Evaluation

    Cloud-based LLM

    DeepSeek-R1/DeepSeek-R1-0528

    Token cost = (N x X + M x Y) x U x V

    N: Average number of input tokens consumed per user query; Empirical value: 20000 tokens (Q&A) and 2000 tokens (report generation)

    M: Average number of output tokens consumed per user query; Empirical value: 2200 tokens (Q&A) and 2000 tokens (report generation)

    X: LLM cloud service input token pricing, subject to actual rates; Tencent Cloud price: $0.01051/K tokens (Pricing details)

    Y: LLM cloud service output token pricing, subject to actual rates; Tencent Cloud price: $0.03032/K tokens (Pricing details)

    U: Monthly number of active users

    V: Average number of deep inference (data interpretation + attribution analysis) queries per user each month; Empirical value: 5 queries

    LLM Capability Check

    FineChatBI requires LLMs to strictly output results in the specified format. Otherwise, FineChatBI cannot be used properly.

    Before connecting to a local LLM, you need to check whether the LLM meets FineChatBI requirements. For details, see Checking Whether LLMs Meet FineChatBI Capability Requirements.

    LLM API Compatibility Check

    iconNote:
    Check API compatibility for local LLMs and skip this step for cloud-based LLMs.

    You need to check whether your local LLM is compatible with the OpenAI API:

    Situation One: Compatible with the OpenAI API

    If the LLM is compatible with the OpenAI API, no modification is required. The LLM can be directly connected to the FineAI service.

    Situation Two: Incompatible with the OpenAI API

    If the LLM is incompatible with the OpenAI API, you are advised to use the vLLM framework to redeploy the local LLM as a service compatible with the OpenAI API. For details, see the official vLLM documentation.

    Situation Three: Incompatible with the OpenAI API and Unable to Redeploy the LLM

    The API needs to be modified. A resident API forwarding service is required as a bridge for communication between the FineAI service and the LLM service.

    The API forwarding service needs to convert the request body of FineAI into one in the format supported by the LLM service, send the request body to the LLM, parse the response body of the LLM, and convert the response body into one in the format supported by FineAI.

    Both FineAI's request body and response body must conform to the OpenAI Chat API specification. For details, see the following:

    More
    • Request URL Specification

    The request URL of the LLM needs to end with /chat/completions.

    • Request Body Specification

    The request body needs to contain the following parameters:

    Parameter
    Description

    model

    Deployment name on the BI Q&A configuration page.

    messages

    Conversation history, including characters and history. The type is List[dict(str,str)].

    temperature

    Randomness of the LLM output results. The larger the value, the stronger the randomness.

    max_tokens

    Number of generated output tokens.

    stream

    Whether to stream the output (bool type).

    The stream sent by FineAI must be transferred to the LLM.

    Whether the response body is of the streaming format must be determined based on the value of stream.

    Request body example:

    {
          "model": "gpt-3.5-turbo",
          "messages": [
              {
                  "role": "system",
                  "content": "You are an arithmetic expert."
              },
              {
                  "role": "user",
                  "content": "How to calculate pi?"
              }
          ],
          "temperature": 0.95,
          "max_tokens": 8192,
          "stream": false
      }
    • Response Body Specification

    > When stream is set to false for the FineAI request body, a non-streaming response body is returned.

    Example of a non-streaming response body:

    {
        "choices": [
          {
            "message": {
              "role": "assistant",
              "content": "The following ways to calculate pi are available..."
            },
            "finish_reason": "stop"
          }
        ]
      }

    > When stream is set to true for the FineAI request body, a streaming response body is returned.

    iconNote:

    The streaming response body must follow the SSE standard.

    Except that the JSON structure must meet the following requirements, each response body must start with data:, end with two carriage returns, and have header set to Content-Type: text/event-stream. For details, see the SSE standard.

    Example of the JSON part in a streaming response body:

    • During the generation, each token is returned by content, and the value of finish_reason is null.

    • After the streaming response ends (namely, the last token is returned), an additional response body is required, where the value of content is null and the value of finish_reason is stop.

    {
          "choices": [
              {
                  "finish_reason": null,
                  "delta": {
                      "content": "Okay"
                  }
              }
          ]
      }



    Whitelist Configuration

    To ensure that FineBI/FineAI can access the LLM, the LLM's address must be added to the whitelist of the FineBI/FineAI server.

    Configuring the Reasoning LLM for FineChatBI

    Connecting to the Local LLM

    Choose Intelligent Q&A Configuration > Other Configurations, and toggle-on LLM, as shown in the following figure.

    推理大模型 图2.png

    Configure the information related to the local reasoning LLM, and click Save, as shown in the following figure.

    推理大模型 图3.png

    Item
    Description

    ApiKey

    UUID for identity authentication, which is usually generated by the service provider.

    (If the API requires no authentication, leave this item empty.)

    endPoint

    Specific LLM service address, through which you can interact with the LLM.

    Enter the base URL, namely, one not suffixed with /chat/completions.

    Model to Deploy

    Name of the model to be connected.

    Connecting to the Cloud-based LLM

    Choose Intelligent Q&A Configuration > Other Configurations, and toggle-on LLM. Enter the information related to the cloud-based reasoning LLM, as shown in the following figure.

    推理大模型 图4.png

    Configuration Item
    Description

    Service Provider Name

    deepseek (LLM compliant with OpenAI API specifications)

    ApiKey

    UUID for identity authentication, which is usually generated by the service provider.

    (If the API requires no authentication, leave this item empty.)

    endPoint

    Specific LLM service address, through which you can interact with the LLM.

    Model to Deploy

    Name of the model to be connected.

     


    附件列表


    主题: FineChatBI Intelligent Q&A
    Previous
    Next
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    10s後關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy