When fetching a large amount of data all at once using SAP RFC, the driver and FineDataLink may crash. Some RFC functions offer ROWCOUNT (controlling the number of rows per fetch), ROWSKIPS (setting the start row to fetch data), and date (filtering data by date) parameters. These parameters function similarly to the offset parameter used in API-based data fetching.
Unlike the API Input operator, the SAP ERP Input operator does not offer Advanced Configuration settings for automatic offset handling.
This document takes fetching data from a MySQL database table as an example to simulate fetching data using SAP RFC.
1. Add times and size parameters in Parameter List.
times: It indicates which fetch (of data of the same date) it is. Set the initial value to 0. If there are 12 records of the same date and 5 records are fetched each time, it will take 3 fetches to obtain all the data.
size: It controls the number of data records per fetch. In this example, five records are fetched each time.
2. Output the earliest date and the latest date in the source table as parameters s_date and e_date, respectively.
3. Set Execution Condition of Loop Container to ${s_date} <= ${e_date}. The Loop Container node reads at most five records each time and judges whether all data of the current date has been read after each fetch.
If there is still unread data of the current date, the s_date value should remain unchanged, and the times value should be incremented by one to continue data fetching.
If all data of the current date has been fetched, the times value should be reset to 0, and the s_date value should be incremented by 1.
The table data to be fetched is shown in the following figure.
Preparing Data-Fetching Parameters
1. Create a scheduled task, click Parameter List, and add the parameters times and size.
Set the size value according to the actual condition. The following table describes the parameters.
It indicates which fetch (of data of the same date) it is. Set the initial value to 0.
If there are 12 records of the same date and 5 records are fetched each time, it will take 3 fetches to obtain all the data.
Preparing Parameters for Conditional Execution of Loop Container
You need to output the date as a parameter, pass its value to Loop Container, and set the execution condition of Loop Container to ${s_date} <= ${e_date} to fetch data based on the value of the size and times parameters until all data of that date has been fetched.
1. Add a Data Transformation node and enter the Data Transformation editing page.
2. Add a Spark SQL operator and define the earliest date and the latest date of the table data to be fetched as the s_date and e_date fields, respectively.
The SQL statement used in this example is as follows.
SELECT '2024-09-01' AS s_date, '2024-09-03' AS e_date
The SQL statement to fetch dates of the first day of the current month and the current day is as follows.
SELECT CONCAT(LEFT(current_date,7),"-01") as s_date,LEFT(current_date,10) as e_date
3. Add a Parameter Output operator and output the s_date and e_date fields as s_date and e_date parameters, respectively.
Add a Loop Container node, set Loop Type to Do-While Loop, and set Execution Condition to ${s_date} <= ${e_date}.
Outputting the Number of Fetched Records of the Date as a Parameter
This section is to enable subsequent nodes to use this parameter to calculate the number of unread records of the current date, which determines whether to update the value of the s_date parameter created in the "Preparing Parameters for Conditional Execution of Loop Container" section and the value of the times parameter created in the "Preparing Data-Fetching Parameters" section.
Drag a Parameter Assignment node into the Loop Container node, obtain the total number of records of the date that has been fetched from the target table, and output it as a parameter.
Prepare a target table to store the fetched data.
The SQL statement is as follows.
SELECT count (*) as num from Target table name where date ='${s_date}'
Fetching Unread Data of the Current Date with Up to Five Records per Fetch
1. Drag a Data Transformation node into the Loop Container node, connect it to the Parameter Assignment node as its downstream node, and click the Data Transformation node to enter the Data Transformation editing page.
Add a DB Table Input operator and enter the SQL statement to fetch unread data of the current date using the s_date parameter. (If there are over five unread records, five records will be fetched this time. Otherwise, all remaining unread records will be fetched).
SELECT * FROM Source table name where date ='${s_date}' and ID>= 1 +(${times}*${size}) and ID <=(${times}+1)*${size}
2. Add a DB Table Output operator to output the fetched data to the target table.
After the first fetch of data of the current date, the values of the s_date and times parameters should be updated.
If all data of the current date has been fetched, the times value should be reset to zero, and the s_date value be changed to another date.
You can reassign values to the s_date and times parameters using the New Calculation Column operator and output the two parameters using the Parameter Output operator.
The data type of the times parameter (set in the "Preparing Data-Fetching Parameters" section of this document) defaults to Value. The calculation in the first New Calculation Column operator is error-free. However, the times parameter is to be output as a parameter by the Parameter Output operator, after which its data type will be changed to String, which will cause errors in the calculation in the second New Calculation Column operator.
Therefore, you need to modify the data type of the times parameter before assigning values using the second New Calculation Column operator to ensure error-free calculation.
Determine whether to update the s_date and times values based on whether all data has been read.
The data type of the times parameter is double. An example value is 1.0. You need to modify the data type to int.
Output s_date and times parameters.
Obtaining the Number of Unread Records of the Current Date
1. Drag a Data Transformation node into the Loop Container node, connect it to the previous Data Transformation node as its downstream node, and enter the editing page.
2. Add a DB Table Input operator and use the following statement to obtain the number of unread records of the current date.
SELECT (COUNT(*) - ${Read}) AS diff FROM Source table name WHERE date = '${s_date}'
Modifying the Data Type of the times Field
1. Add a New Calculation Column operator, add a time field, and reference the times parameter in Parameter Value to pass times values.
2. Add a Field Setting operator and modify the data type of the time field to int to facilitate subsequent calculation in New Calculation Column.
Updating Parameter Values
1. Add a New Calculation Column operator to update the times and s_date values.
The value of the times field is as follows.
IF(diff=0,0,time+1)
The value of the s_date field is as follows.
IF (diff = 0, format (ADDTODATE (s_date,'d',1),' yyy-MM-dd'), s_date)
Click Data Preview, as shown in the following figure.
The data type of the times field is double, which needs changing to int.
3. Add a Field Setting operator, modify the data type of the times field to int, and delete the time field.
4. Add a Parameter Output operator to output the times and s_date as parameters.
Click Run. All data has been written into the target table.
You can then publish the scheduled task to Production Mode and set the execution frequency.
滑鼠選中內容,快速回饋問題
滑鼠選中存在疑惑的內容,即可快速回饋問題,我們將會跟進處理。
不再提示
10s後關閉
Submitted successfully
Network busy