Allowed viewing the return result of Python code for debugging during data development. For details, see the "Configuring the Python Operator" section of this document.
Allowed connecting a Python operator with multiple input operators and one process operator.
Fixed the issue that the Python module in the image package of FineOps-deployed FineDataLink would be overwritten due to directory mounting to the host machine.
Optimized the memory-limited logic for the Python operator. For details, see the "Memory Usage Limit of the Python Operator" section of this document.
Complex data processing that is difficult to implement using visual operators or Spark SQL can be done with Python scripts in the Data Transformation node during data development.
During data development in FineDataLink, you may want to read data from files that are not supported by the File Input operator. In such scenarios, you can load file data by running Python scripts.
The Python operator in the Data Transformation node enables you to call Python scripts for complex data processing. The following figure explains the function.
1. In FineDataLink of versions before V4.2.5.3, the Python operator could only be preceded by one input operator.
Starting from FineDataLink V4.2.5.3, the Python operator can be preceded by multiple input operators and one process operator.
2. The Python operator cannot be placed between two process operators. If all inputs of the Python operator are input operators, the Python operator can be followed by process operators of the connection, transformation, and laboratory types, excluding the Field Setting operator. If the input of the Python operator contains process operators, the Python operator can only be followed by output operators.
3. The Python editor provides auto-completion exclusively for basic Python syntax, but not for methods imported via import statements. It lacks syntax highlighting and error validation.
4. The Python operator supports file loading from absolute and relative paths.
Note: In FineDataLink of versions before V4.1.6.2, the default path was FineDataLink installation directory\webapps\webroot\WEB-INF\assist\python. Starting from FineDataLink V4.1.6.2, the default path is FineDataLink installation directory\webapps\webroot\WEB-INF\plugin\fdl_python.
This path can be customized. For details, see the "Adding the python.properties File" section of this document. Calculate the relative path according to this path.
5. You can import custom functions using the Python operator.
You can import third-party modules installed in the Python runtime environment.
You can import custom modules from webroot/WEB-INF/assist/python/resources.
6. The input of the Python operator exists as Pandas DataFrames in Python code. Reference the usage of DataFrames if you need to process the data source.
7. The NumPy library removes support for the np.float type since V2.0 (released on June 16, 2024). Type adjustments using np.float will fail.
8. FineDataLink of V4.1.13.1 and later versions introduces a new parameter PythonConfig.metaFromMock in FINE_CONF_ENTITY, through which you can control the execution logic of the Python operator.
false (default value): When a Python operator starts execution, the system will execute code in the Python operator once using preview data from upstream operators to obtain output metadata and then perform a second execution for actual data transformation.
true (You need to restart the FineDataLink project after modifying the value to true.): When a Python operator starts execution, the system will generate empty mock data by mimicking upstream operators' metadata, use it to execute code in the Python operator once to obtain output metadata, and then perform a second execution for actual data transformation.
To use the Python operator, you must prepare a Python environment.
Use Python 3.x.
In Linux and Windows environments:
1. Install pandas.
pip3 install pandas
2. Install datetime.
pip3 install datetime
The following table describes customizable contents in the python.properties file.
It specifies the working directory.
For FineDataLink of versions before V4.1.6.2, the default path is FineDataLink installation directory\webapps\webroot\WEB-INF\assist\python.
For FineDataLink of V4.1.6.2 and later versions, the default path is FineDataLink installation directory\webapps\webroot\WEB-INF\plugin\fdl_python.
It specifies the Python environment path. The default environment variable is python for Windows systems and python3 for Linux systems.
By default:
The system uses the Python environment available in the environment variable, requiring no manual path configuration.
For Linux systems, FineDataLink can identify the python3 command if it's accessible via the command line interface.
For Windows systems, FineDataLink can identify the python command if it's accessible via the command line interface.
Examples of the custom content:
1. Linux system:
python.cmd=/home/python/bin/python3
2. Windows system:
python.cmd=E:\\Python3x\\python.exe
Follow the steps if you want to adjust the configuration of these items.
1. Create a folder.
For FineDataLink of versions before 4.1.6.2:
Create a python\config path in tomcat\webapps\webroot\WEB-INF\assist.
For FineDataLink of V4.1.7.2 and later versions
Create a config folder in tomcat\webapps\webroot\WEB-INF\plugin\fdl_python.
2. Place the python.properties file into the created folder. (After modifying the python.properties file, you must restart the project, which can be done after you finish operations in the "Modifying the fine_conf_entity File" section of this document.)
You can download the example file of python.properties. (Modify the values based on actual requirements.)python.properties.zip
Find the fine_conf_entity table in the FineDB database and add a setting item PythonConfig.enable with its value set to true. Restart the project after adding the setting item.
This example illustrates how to use a Python script to generate the code for each book in the book table.
1. Create a scheduled task, drag a Data Transformation node onto the page, and enter the Data Transformation editing page.
2. Drag in a DB Table Input operator and fetch data from the book table, as shown in the following figure.
1. Drag in a Python operator and write a script that generates the code for each book.
1. The DB Table Input in the following script is the input source of the Python operator, which should be input by clicking the input source.
2. In Windows systems, FineDataLink of versions before V4.0.30 does not support double quotes ("") in the code, which are supported in FineDataLink of V4.0.30 and later versions.
3. Reference parameters (if any) in the format of ${Parameter name}.
import pandas as pd# You must use pandas.# If there is a connected data source, you can click the data source above to use it. Data from the input source exists in a pandas DataFrame, and can be processed through the DataFrame method.input = DB Table Inputoutput = input.assign(book_code=range(1, len(DB Table Input.title) + 1))# Assign the data to be output to the downstream operator to an output variable. If the data is of the DataFrame data type, output it in the form of a two-dimensional table. If the data is of other data types, output it in the form of a string.
2. Click Data Preview. A book_code column is generated. The following figure explains the function.
Starting from V4.2.5.3, FineDataLink supports Python code debugging. You can use print statements in the Python operator and click Code Execution Result to check the return message, enabling iterative code adjustments, as shown in the following figure.
The code execution result also includes the logs during execution, as shown in the following figure.
1. Drag a DB Table Output operator onto the page and configure the operator, as shown in the following figure.
2. Click the Run button in the upper right corner.
Data in the generated table after successful execution is shown in the following figure.
You can set the output log level for individual scheduled tasks under Task Control > Task Attribute > Log Level Setting to meet different needs of log viewing, debugging, and troubleshooting.
To get a detailed log display, select INFO from the drop-down list of Log Level Setting. For details, see Log Level Setting.
Detailed logs will be displayed in Log after you run the task.
The system calculates a per-thread memory threshold for Python preview threads, which is 30% of system free memory/Thread count, then validates the data volume in the preceding operator against this threshold. If the data volume exceeds the per-thread limit, preview failure will occur.
For example, if there is 6 GB of free system memory, the total available memory for Python preview threads will be 1.8 GB. If there are 3 preview threads, the available memory for each thread will be 0.6 GB. If the data volume in a preceding operator exceeds 0.6 GB, a preview error will occur.
The system calculates a per-thread memory threshold for Python execution threads, which is 50% of system free memory/Thread count, then validates the data volume in the preceding operator against this threshold. If the data volume exceeds the per-thread limit, execution failure will occur.
For example, if there is 6 GB of free system memory, the total available memory for Python execution threads will be 3 GB. If there are 3 execution threads, the available memory for each thread will be 1 GB. If the data volume in a preceding operator exceeds 1 GB, an execution error will occur.
The system does not validate the corresponding memory usage for FineDataLink projects deployed in containers.
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy