FDL Deployment Environment Preparation

  • Last update: August 19, 2024
  • Software Environment Requirement

    Category

    Model

    Linux

    Centos6  

    Centos 6.5, Centos 6.6, Centos 6.7, Centos 6.8, and Centos 6.9

    Centos7

    Centos 7.x

    RedHat6

    RedHat 6.5, RedHat 6.6, RedHat 6.7, RedHat 6.8, and RedHat 6.9

    RedHat7

    RedHat 7.0, RedHat 7.1, RedHat 7.2, RedHat 7.3, and RedHat 7.4

    Windows

    Windows Server 2008 and above

    Windows 11

    The Data Pipeline function is usable only on Linux systems.

    Configuration database

    RDS MySQLMySQL, SQL ServerOracleDb2, and PostgreSQL

    Configure an external database for formal projects. For details, see External Database Configuration.

    Browser

    Google Chrome

    The latest version is recommended.

    Note: The above operating systems are recommended.  You can contact FanRuan technical support if you encounter problems deploying FineDataLink on other Linux systems.

    Network Requirement 

    The project can be deployed in both wide area networks and local area networks. The requirements for the network environment are shown in the following table.

    Network

    Data Volume

    Bandwidth

    Wide area network

    Less than ten million rows 

    50 MB

    Local area network

    /

    Unlimited

    More than or equal to ten million rows

    100 MB

    Bandwidth is mainly used to limit the amount of data transmitted over a network connection in a given amount of time, which can be calculated using the following formula. 

    Bandwidth (Mb/s) * 80%/8 (bit) = Traffic (Mb/s). 

    Multiplying the bandwidth by 80% is to reserve a bandwidth margin to avoid network congestion in the actual transmission.

    You can refer to the following table.

    Bandwidth
    Traffic Threshold

    Recommendation

    5 Mb/s 512 Kb/s⭐️
    50 Mb/s5 Mb/s⭐️⭐️
    100 Mb/s10 Mb/s⭐️⭐️⭐️
    10000 Mb/s 81000 Mb/s⭐️

    After conducting numerous internal tests using tens of millions of data records, it has been found that the transmission traffic is close to a value corresponding to the bandwidth of 50 Mb/s (not exceeding 100 Mb/s). Therefore, bandwidth between 50 Mb/s and 100 Mb/s is recommended.

    Machine Requirement

    Resource Control Memory 

    Note: For details of resource control settings, see Load Distribution.

    Unknown Number of Scheduled Tasks and Pipeline Tasks (Applicable to Newly Deployed Projects)

    Memory
    Recommendation
    8 GB⭐️⭐️
    16 GB⭐️⭐️⭐️
    32 GB⭐️
    More than 32 GB

    Allocate the memory according to actual business needs. An excessively large memory may cause a long FullGC time.

    Known Number of Scheduled Tasks and Pipeline Tasks (Applicable to Project Migration and Upgrade)

    Minimum memory size = MAX(minimum memory size for running scheduled tasks, minimum memory size for running pipeline tasks)

    Accuracy

    Memory

    Accurate

    The minimum memory size shall be the greater one between the minimum memory size for running scheduled tasks and for running pipeline tasks. For details, see the following table.

    Rough

    The minimum memory size shall be the greater one between the minimum memory size for running scheduled tasks and for running pipeline tasks. Take 1 GB as the memory value for each scheduled and pipeline task. 

    1. For 4.1.55 and later releases:

    Type

    Node

    Memory

    Scheduled task

    Single input node

    Calculation formula: buffer + outputSize*2*channel

    The formula is described below. 

    • buffer:

    For non-relational DB table input (such as Jodoo and Mongo) and other inputs (such as API input and file input), a Reader takes up 64 MB of memory.

    For relational DB table input, the size depends on the table structure. Allocate 1 MB of memory for each column of the input table, and in particular, 2 MB if the precision of the column exceeds 1024. The resulting size shall be a multiple of 8MB and not exceed 64 MB.

    For example, if a table has the following structure, it takes 9 MB of memory and shall be allocated with 16 GB of memory. 

    Type
    Precision
    small int5
    big int19
    decimal18
    varchar100
    varbinary2000
    time0
    float15
    bit0
    • channel:

    The calculation of channel memory is relatively complex. Generally, it takes 8 MB or 16 MB of memory, not exceeding 64 MB.

    • outputSize:

    It is the number of succeeding nodes connected with the input node.

    Process node

    64 + outputSize*2*64

    • outputSize (different from one mentioned above):

    It is the sum of the number of output nodes and Python nodes that are directly connected with the process node.

    The succeeding process node of the process node is not included.

    Single output node

    32 MB

    An output node usually takes 32 MB of memory. Specifically, if the data is output to Doris or StarRocks, a single output node takes 90 MB of memory.

    Pipeline task 

    /The memory calculation method of input and output nodes is the same as that of the scheduled task.

    Take the following task an an example, the required memory is described below.

    Input nodes (three): (8 + 1 * 2 * 24) * 3

    Process node (one): 64 + 2 * 2 * 64

    Output nodes (two): 32 + 32

    Total: 552 MB

    You can find the description in the corresponding log.

    2. For releases before 4.1.55:

    a. Estimated memory required for running scheduled/pipeline tasks (applicable to multi-task scenarios where accurate calculation is impossible)

    Type

    Node

    JVM Memory

    Scheduled task

    /1024 MB per task

    Pipeline task

    /1024 MB per task

    b. Accurate memory required for running scheduled/pipeline tasks

    Type

    Node

    JVM Memory

    Scheduled task

    Single input node

    64 MB + 128 MB * Number of output channels

    All process nodes64 MB + 128 MB * Number of connected output nodes
    Single output node32 MB

    Pipeline task

    /1024 MB per task

    The following is an example of calculating the memory of a scheduled task.

    Input: 64 MB + Output node quantity * (64 MB +64 MB)

    Process:  64 MB + Ultimate output node quantity * (64 MB +64 MB)

    Output: Output node quantity * 32 MB

    Calculate the memory used by the task:

    Input: 2 * (64 MB + 1 * (64 MB +64 MB)) = 384 MB

    Process:  64 MB + 3 * (64 MB +64 MB) = 448 MB

    Output: 3* 32 MB =96 MB

    Total: 928 MB

    Web Container Memory 

     The Web container memory should be equal to or larger than the resource control memory. For example, if an initialized project has a resource control memory of 16 GB, the Web container memory should be set to a value higher than 16 GB, but not more than 80% of the system memory (recommended).

    Note 1: For details of modifying the container memory, see Tomcat Memory Modification.

    Note 2: For details of the resource control memory, see the Resource Control Memory section of this article.


    System Memory 

    The system memory should be equal to or greater than the Web container memory (with a recommended value of less than 80% of the system memory).

    Note: The larger the system memory, the stronger the system scalability during use. For example, to increase the number of concurrent computations from four to eight, you just need to increase the resource control memory and the Web container memory.

    CPU Configuration 

    The number of threads should be at least twice the number of concurrent tasks.

    To ensure the high performance of concurrent transmission, the number of CPU threads can be slightly greater than twice the number of concurrent tasks.

    Number of Concurrent Tasks

    Recommended Number of CPU Threads
    48
    1020
    NN*2

    The CPU mainly limits the number of scheduled tasks and pipeline tasks that run concurrently.

    The number of CPU threads affects full-volume synchronization of the data pipeline task, whereas incremental synchronization remains unaffected.

     

    Disk Space

    The disk space shall be more than 50 GB.

    Disk space is mainly occupied by files (installation filestask fileslog files, and backup files) and task read and write throughput. Data table read and write mainly uses the memory, thus requiring little disk space if the memory is sufficient.


    CategoryProjectDisk UsageDescription

    File space

    Initial installation 4G/
    Task file

    20 MB per hundred tasks

    It is an estimated value based on the task quantity in the internal testing environment.

    Running log file




    1 MB per hundred recordsIt is an estimated value based on the log record quantity in the internal testing environment.

    Application log

    Less than 10 GB

    You can clean the log.

    Backup file 

    Less than 20 GBYou can clean the backup file.

    Local directory of the server 

    X

    Increase it according to actual usage.

    Data throughput

    /

    10 GB

    /
    GPLOAD1 GB per task by defaultIncrease it according to actual usage.

    In summary, reserve at least 50 GB of disk space for server deployment. You can increase the disk space as needed if you need to store Excel and CSV files in the server's local directory.

    Deployment Package Preparation

    Contact the technical support personnel for the installation package.

    Port Preparation

    TypeContentPort NumberRemark

    Web container 

    Tomcat

    8080

    Note: For projects deployed in FDL 4.0.6 and later releases, the default port number is modified to 8068.

    It is an external port that can be closed to the public and can be modified (in server.xml). You can configure its SSL and short address at the load balancer.

    Notification

    WebSocket port

    Default port numbers for FDL 4.0.6 and later releases are 58888 and 59888, and 38888 and 39888 for releases before 4.0.6.

    For details, see WebSocket Port Configuration for Standalone Deployment.


    WebSocket forwarding port

    The default port number for FDL 4.0.6 and later releases is 58889, and 38889 for releases before 4.0.6.


    1. For details about port occupation, see Viewing Port Usage.

    2. If the default port number conflicts with that of other projects, modify the port number and then open the corresponding port.

    3. To deploy multiple Tomcat projects on a server, modify the Tomcat port number to prevent port conflict. For details, see Modifying Tomcat Port Number.

    4. If the firewall is enabled, you need to open the relevant port. For a Windows system, see Setting Inbound and Outbound Rules for Windows Server. For a Linux system, see Using and Configuring Linux Firewall.

    5. For environments with strict port restrictions between Docker containers or servers, open ports between the node servers for inter-node communication.

    • If you use the TCP protocol, open the following ports: 7800, 7810, 7820, 7830, 7840, 7850, 7860, and 7870.

    • If you use the UDP protocol, the port number for inter-node communication is a random one between 45588 and 65536.

    附件列表


    主题: Deployment and Upgrade
    Previous
    Next
    • Helpful
    • Not helpful
    • Only read

    滑鼠選中內容,快速回饋問題

    滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。

    不再提示

    10s後關閉

    Get
    Help
    Online Support
    Professional technical support is provided to quickly help you solve problems.
    Online support is available from 9:00-12:00 and 13:30-17:30 on weekdays.
    Page Feedback
    You can provide suggestions and feedback for the current web page.
    Pre-Sales Consultation
    Business Consultation
    Business: international@fanruan.com
    Support: support@fanruan.com
    Page Feedback
    *Problem Type
    Cannot be empty
    Problem Description
    0/1000
    Cannot be empty

    Submitted successfully

    Network busy